This is the EDA for 2018 KDD CUP

Checker

In [67]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
path='../ml_dataset/2018_kdd_cup_dataset/'
pd.read_csv(path+"beijing_17_18_aq.csv")
Out[67]:
stationId utc_time PM2.5 PM10 NO2 CO O3 SO2
0 aotizhongxin_aq 2017-01-01 14:00:00 453.0 467.0 156.0 7.2 3.0 9.0
1 aotizhongxin_aq 2017-01-01 15:00:00 417.0 443.0 143.0 6.8 2.0 8.0
2 aotizhongxin_aq 2017-01-01 16:00:00 395.0 467.0 141.0 6.9 3.0 8.0
3 aotizhongxin_aq 2017-01-01 17:00:00 420.0 484.0 139.0 7.4 3.0 9.0
4 aotizhongxin_aq 2017-01-01 18:00:00 453.0 520.0 157.0 7.6 4.0 9.0
5 aotizhongxin_aq 2017-01-01 19:00:00 429.0 NaN 141.0 6.5 3.0 9.0
6 aotizhongxin_aq 2017-01-01 20:00:00 211.0 NaN 110.0 3.3 NaN 11.0
7 aotizhongxin_aq 2017-01-01 21:00:00 116.0 NaN 87.0 2.2 4.0 13.0
8 aotizhongxin_aq 2017-01-01 22:00:00 51.0 NaN 58.0 1.3 26.0 14.0
9 aotizhongxin_aq 2017-01-01 23:00:00 38.0 NaN 55.0 1.1 28.0 14.0
10 aotizhongxin_aq 2017-01-02 00:00:00 21.0 NaN 40.0 0.7 42.0 16.0
11 aotizhongxin_aq 2017-01-02 01:00:00 16.0 NaN 40.0 0.8 44.0 18.0
12 aotizhongxin_aq 2017-01-02 02:00:00 23.0 NaN 42.0 0.7 45.0 17.0
13 aotizhongxin_aq 2017-01-02 03:00:00 18.0 NaN 30.0 0.6 59.0 14.0
14 aotizhongxin_aq 2017-01-02 04:00:00 58.0 247.0 76.0 2.4 46.0 12.0
15 aotizhongxin_aq 2017-01-02 05:00:00 176.0 NaN 99.0 0.3 42.0 17.0
16 aotizhongxin_aq 2017-01-02 06:00:00 109.0 211.0 86.0 2.6 58.0 14.0
17 aotizhongxin_aq 2017-01-02 07:00:00 267.0 NaN 125.0 4.2 53.0 19.0
18 aotizhongxin_aq 2017-01-02 08:00:00 260.0 NaN 115.0 3.9 56.0 18.0
19 aotizhongxin_aq 2017-01-02 09:00:00 212.0 NaN 112.0 3.6 50.0 14.0
20 aotizhongxin_aq 2017-01-02 10:00:00 183.0 NaN 111.0 0.3 33.0 14.0
21 aotizhongxin_aq 2017-01-02 11:00:00 136.0 NaN 89.0 2.3 41.0 14.0
22 aotizhongxin_aq 2017-01-02 12:00:00 137.0 NaN 89.0 2.2 35.0 13.0
23 aotizhongxin_aq 2017-01-02 13:00:00 125.0 NaN 90.0 2.1 28.0 13.0
24 aotizhongxin_aq 2017-01-02 14:00:00 153.0 NaN 108.0 2.5 13.0 14.0
25 aotizhongxin_aq 2017-01-02 15:00:00 165.0 NaN 126.0 3.1 2.0 14.0
26 aotizhongxin_aq 2017-01-02 16:00:00 225.0 NaN 148.0 4.9 4.0 23.0
27 aotizhongxin_aq 2017-01-02 17:00:00 275.0 NaN 158.0 6.4 7.0 28.0
28 aotizhongxin_aq 2017-01-02 18:00:00 293.0 NaN 155.0 6.4 5.0 17.0
29 aotizhongxin_aq 2017-01-02 19:00:00 293.0 NaN 153.0 6.2 5.0 16.0
... ... ... ... ... ... ... ... ...
310980 zhiwuyuan_aq 2018-01-30 10:00:00 NaN NaN NaN NaN NaN NaN
310981 zhiwuyuan_aq 2018-01-30 11:00:00 NaN NaN NaN NaN NaN NaN
310982 zhiwuyuan_aq 2018-01-30 12:00:00 NaN NaN NaN NaN NaN NaN
310983 zhiwuyuan_aq 2018-01-30 13:00:00 NaN NaN NaN NaN NaN NaN
310984 zhiwuyuan_aq 2018-01-30 14:00:00 NaN NaN NaN NaN NaN NaN
310985 zhiwuyuan_aq 2018-01-30 15:00:00 NaN NaN NaN NaN NaN NaN
310986 zhiwuyuan_aq 2018-01-30 16:00:00 NaN NaN NaN NaN NaN NaN
310987 zhiwuyuan_aq 2018-01-30 17:00:00 NaN NaN NaN NaN NaN NaN
310988 zhiwuyuan_aq 2018-01-30 18:00:00 NaN NaN NaN NaN NaN NaN
310989 zhiwuyuan_aq 2018-01-30 19:00:00 NaN NaN NaN NaN NaN NaN
310990 zhiwuyuan_aq 2018-01-30 20:00:00 NaN NaN NaN NaN NaN NaN
310991 zhiwuyuan_aq 2018-01-30 21:00:00 NaN NaN NaN NaN NaN NaN
310992 zhiwuyuan_aq 2018-01-30 22:00:00 NaN NaN NaN NaN NaN NaN
310993 zhiwuyuan_aq 2018-01-30 23:00:00 NaN NaN NaN NaN NaN NaN
310994 zhiwuyuan_aq 2018-01-31 00:00:00 NaN NaN NaN NaN NaN NaN
310995 zhiwuyuan_aq 2018-01-31 01:00:00 NaN NaN NaN NaN NaN NaN
310996 zhiwuyuan_aq 2018-01-31 02:00:00 NaN NaN NaN NaN NaN NaN
310997 zhiwuyuan_aq 2018-01-31 03:00:00 NaN NaN NaN NaN NaN NaN
310998 zhiwuyuan_aq 2018-01-31 04:00:00 NaN NaN NaN NaN NaN NaN
310999 zhiwuyuan_aq 2018-01-31 05:00:00 NaN NaN NaN NaN NaN NaN
311000 zhiwuyuan_aq 2018-01-31 06:00:00 NaN NaN NaN NaN NaN NaN
311001 zhiwuyuan_aq 2018-01-31 07:00:00 NaN NaN NaN NaN NaN NaN
311002 zhiwuyuan_aq 2018-01-31 08:00:00 NaN NaN NaN NaN NaN NaN
311003 zhiwuyuan_aq 2018-01-31 09:00:00 NaN NaN NaN NaN NaN NaN
311004 zhiwuyuan_aq 2018-01-31 10:00:00 NaN NaN NaN NaN NaN NaN
311005 zhiwuyuan_aq 2018-01-31 11:00:00 NaN NaN NaN NaN NaN NaN
311006 zhiwuyuan_aq 2018-01-31 12:00:00 NaN NaN NaN NaN NaN NaN
311007 zhiwuyuan_aq 2018-01-31 13:00:00 NaN NaN NaN NaN NaN NaN
311008 zhiwuyuan_aq 2018-01-31 14:00:00 NaN NaN NaN NaN NaN NaN
311009 zhiwuyuan_aq 2018-01-31 15:00:00 NaN NaN NaN NaN NaN NaN

311010 rows × 8 columns

In [5]:
pd.read_csv(path+"beijing_201802_201803_aq.csv")
Out[5]:
stationId utc_time PM2.5 PM10 NO2 CO O3 SO2
0 aotizhongxin_aq 2018-01-31 16:00:00 49.0 82.0 90.0 0.9 6.0 10.0
1 aotizhongxin_aq 2018-01-31 17:00:00 47.0 80.0 90.0 0.9 5.0 10.0
2 aotizhongxin_aq 2018-01-31 18:00:00 46.0 91.0 91.0 1.3 5.0 28.0
3 aotizhongxin_aq 2018-01-31 19:00:00 60.0 95.0 85.0 2.0 6.0 38.0
4 aotizhongxin_aq 2018-01-31 20:00:00 52.0 91.0 81.0 1.9 5.0 30.0
5 aotizhongxin_aq 2018-01-31 21:00:00 38.0 80.0 72.0 1.2 4.0 14.0
6 aotizhongxin_aq 2018-01-31 22:00:00 30.0 70.0 70.0 0.9 3.0 8.0
7 aotizhongxin_aq 2018-01-31 23:00:00 29.0 75.0 73.0 0.8 3.0 10.0
8 aotizhongxin_aq 2018-02-01 00:00:00 26.0 79.0 73.0 0.9 4.0 7.0
9 aotizhongxin_aq 2018-02-01 01:00:00 28.0 95.0 73.0 1.1 7.0 10.0
10 aotizhongxin_aq 2018-02-01 02:00:00 38.0 96.0 59.0 1.1 21.0 20.0
11 aotizhongxin_aq 2018-02-01 03:00:00 43.0 102.0 50.0 1.1 34.0 22.0
12 aotizhongxin_aq 2018-02-01 04:00:00 48.0 110.0 47.0 1.1 42.0 20.0
13 aotizhongxin_aq 2018-02-01 05:00:00 42.0 120.0 29.0 0.7 62.0 13.0
14 aotizhongxin_aq 2018-02-01 06:00:00 32.0 124.0 21.0 0.5 73.0 12.0
15 aotizhongxin_aq 2018-02-01 07:00:00 33.0 131.0 19.0 0.4 78.0 11.0
16 aotizhongxin_aq 2018-02-01 08:00:00 29.0 110.0 21.0 0.4 75.0 10.0
17 aotizhongxin_aq 2018-02-01 09:00:00 27.0 100.0 24.0 0.4 69.0 10.0
18 aotizhongxin_aq 2018-02-01 10:00:00 24.0 90.0 22.0 0.4 71.0 5.0
19 aotizhongxin_aq 2018-02-01 11:00:00 16.0 76.0 23.0 0.3 67.0 3.0
20 aotizhongxin_aq 2018-02-01 12:00:00 14.0 49.0 23.0 0.3 66.0 3.0
21 aotizhongxin_aq 2018-02-01 13:00:00 9.0 38.0 15.0 0.3 71.0 2.0
22 aotizhongxin_aq 2018-02-01 14:00:00 6.0 35.0 11.0 0.2 74.0 2.0
23 aotizhongxin_aq 2018-02-01 15:00:00 6.0 32.0 10.0 0.2 74.0 2.0
24 aotizhongxin_aq 2018-02-01 16:00:00 9.0 38.0 18.0 0.2 61.0 2.0
25 aotizhongxin_aq 2018-02-01 17:00:00 10.0 46.0 18.0 0.2 62.0 2.0
26 aotizhongxin_aq 2018-02-01 18:00:00 8.0 34.0 24.0 0.2 54.0 2.0
27 aotizhongxin_aq 2018-02-01 19:00:00 8.0 27.0 22.0 0.2 55.0 3.0
28 aotizhongxin_aq 2018-02-01 20:00:00 8.0 25.0 20.0 0.3 NaN 4.0
29 aotizhongxin_aq 2018-02-01 21:00:00 7.0 34.0 16.0 0.2 56.0 4.0
... ... ... ... ... ... ... ... ...
49390 zhiwuyuan_aq 2018-03-30 10:00:00 NaN NaN NaN NaN NaN NaN
49391 zhiwuyuan_aq 2018-03-30 11:00:00 NaN NaN NaN NaN NaN NaN
49392 zhiwuyuan_aq 2018-03-30 12:00:00 NaN NaN NaN NaN NaN NaN
49393 zhiwuyuan_aq 2018-03-30 13:00:00 NaN NaN NaN NaN NaN NaN
49394 zhiwuyuan_aq 2018-03-30 14:00:00 NaN NaN NaN NaN NaN NaN
49395 zhiwuyuan_aq 2018-03-30 15:00:00 NaN NaN NaN NaN NaN NaN
49396 zhiwuyuan_aq 2018-03-30 16:00:00 NaN NaN NaN NaN NaN NaN
49397 zhiwuyuan_aq 2018-03-30 17:00:00 NaN NaN NaN NaN NaN NaN
49398 zhiwuyuan_aq 2018-03-30 18:00:00 NaN NaN NaN NaN NaN NaN
49399 zhiwuyuan_aq 2018-03-30 19:00:00 NaN NaN NaN NaN NaN NaN
49400 zhiwuyuan_aq 2018-03-30 20:00:00 NaN NaN NaN NaN NaN NaN
49401 zhiwuyuan_aq 2018-03-30 21:00:00 NaN NaN NaN NaN NaN NaN
49402 zhiwuyuan_aq 2018-03-30 22:00:00 NaN NaN NaN NaN NaN NaN
49403 zhiwuyuan_aq 2018-03-30 23:00:00 NaN NaN NaN NaN NaN NaN
49404 zhiwuyuan_aq 2018-03-31 00:00:00 NaN NaN NaN NaN NaN NaN
49405 zhiwuyuan_aq 2018-03-31 01:00:00 NaN NaN NaN NaN NaN NaN
49406 zhiwuyuan_aq 2018-03-31 02:00:00 NaN NaN NaN NaN NaN NaN
49407 zhiwuyuan_aq 2018-03-31 03:00:00 NaN NaN NaN NaN NaN NaN
49408 zhiwuyuan_aq 2018-03-31 04:00:00 NaN NaN NaN NaN NaN NaN
49409 zhiwuyuan_aq 2018-03-31 05:00:00 NaN NaN NaN NaN NaN NaN
49410 zhiwuyuan_aq 2018-03-31 06:00:00 NaN NaN NaN NaN NaN NaN
49411 zhiwuyuan_aq 2018-03-31 07:00:00 NaN NaN NaN NaN NaN NaN
49412 zhiwuyuan_aq 2018-03-31 08:00:00 NaN NaN NaN NaN NaN NaN
49413 zhiwuyuan_aq 2018-03-31 09:00:00 NaN NaN NaN NaN NaN NaN
49414 zhiwuyuan_aq 2018-03-31 10:00:00 NaN NaN NaN NaN NaN NaN
49415 zhiwuyuan_aq 2018-03-31 11:00:00 NaN NaN NaN NaN NaN NaN
49416 zhiwuyuan_aq 2018-03-31 12:00:00 NaN NaN NaN NaN NaN NaN
49417 zhiwuyuan_aq 2018-03-31 13:00:00 NaN NaN NaN NaN NaN NaN
49418 zhiwuyuan_aq 2018-03-31 14:00:00 NaN NaN NaN NaN NaN NaN
49419 zhiwuyuan_aq 2018-03-31 15:00:00 NaN NaN NaN NaN NaN NaN

49420 rows × 8 columns

In [6]:
beji_aqi_sta=pd.read_csv(path+"Beijing_AirQuality_Stations_en.xlsx")
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-6-635ed99cc902> in <module>()
----> 1 beji_aqi_sta=pd.read_csv(path+"Beijing_AirQuality_Stations_en.xlsx")

~/.local/lib/python3.5/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    707                     skip_blank_lines=skip_blank_lines)
    708 
--> 709         return _read(filepath_or_buffer, kwds)
    710 
    711     parser_f.__name__ = name

~/.local/lib/python3.5/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    447 
    448     # Create the parser.
--> 449     parser = TextFileReader(filepath_or_buffer, **kwds)
    450 
    451     if chunksize or iterator:

~/.local/lib/python3.5/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    816             self.options['has_index_names'] = kwds['has_index_names']
    817 
--> 818         self._make_engine(self.engine)
    819 
    820     def close(self):

~/.local/lib/python3.5/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1047     def _make_engine(self, engine='c'):
   1048         if engine == 'c':
-> 1049             self._engine = CParserWrapper(self.f, **self.options)
   1050         else:
   1051             if engine == 'python':

~/.local/lib/python3.5/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1693         kwds['allow_leading_cols'] = self.index_col is not False
   1694 
-> 1695         self._reader = parsers.TextReader(src, **kwds)
   1696 
   1697         # XXX

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._get_header()

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte
In [7]:
Lodon_aqi=pd.read_csv(path+"London_historical_aqi_forecast_stations_20180331.csv")
Lodon_aqi
Out[7]:
Unnamed: 0 MeasurementDateGMT station_id PM2.5 (ug/m3) PM10 (ug/m3) NO2 (ug/m3)
0 0 2017/1/1 0:00 CD1 40.0 44.4 36.6
1 1 2017/1/1 1:00 CD1 31.6 34.4 46.2
2 2 2017/1/1 2:00 CD1 24.7 28.1 38.3
3 3 2017/1/1 3:00 CD1 21.2 24.5 32.8
4 4 2017/1/1 4:00 CD1 24.9 23.0 28.1
5 5 2017/1/1 5:00 CD1 24.6 23.9 29.3
6 6 2017/1/1 6:00 CD1 23.9 22.0 28.8
7 7 2017/1/1 7:00 CD1 22.0 22.9 34.6
8 8 2017/1/1 8:00 CD1 19.0 20.1 44.6
9 9 2017/1/1 9:00 CD1 19.9 24.4 55.3
10 10 2017/1/1 10:00 CD1 16.6 17.5 46.4
11 11 2017/1/1 11:00 CD1 14.5 14.6 42.5
12 12 2017/1/1 12:00 CD1 11.0 13.9 44.5
13 13 2017/1/1 13:00 CD1 13.9 13.4 53.6
14 14 2017/1/1 14:00 CD1 8.3 8.6 61.4
15 15 2017/1/1 15:00 CD1 7.3 6.1 57.9
16 16 2017/1/1 16:00 CD1 4.9 6.1 46.0
17 17 2017/1/1 17:00 CD1 4.6 8.1 96.5
18 18 2017/1/1 18:00 CD1 8.3 7.3 64.2
19 19 2017/1/1 19:00 CD1 7.3 8.9 59.7
20 20 2017/1/1 20:00 CD1 7.8 11.3 61.3
21 21 2017/1/1 21:00 CD1 11.1 9.8 57.0
22 22 2017/1/1 22:00 CD1 7.1 13.0 50.9
23 23 2017/1/1 23:00 CD1 6.8 11.4 40.2
24 24 2017/1/2 0:00 CD1 6.2 7.3 33.9
25 25 2017/1/2 1:00 CD1 9.0 9.9 21.7
26 26 2017/1/2 2:00 CD1 8.2 6.4 19.5
27 27 2017/1/2 3:00 CD1 7.0 9.1 18.8
28 28 2017/1/2 4:00 CD1 6.7 8.6 19.7
29 29 2017/1/2 5:00 CD1 7.4 13.3 27.7
... ... ... ... ... ... ...
141631 10867 2018/3/29 19:00 TH4 8.5 16.7 85.9
141632 10868 2018/3/29 20:00 TH4 8.8 19.4 89.6
141633 10869 2018/3/29 21:00 TH4 8.8 17.9 82.4
141634 10870 2018/3/29 22:00 TH4 5.0 14.5 61.8
141635 10871 2018/3/29 23:00 TH4 4.6 14.2 67.6
141636 10872 2018/3/30 0:00 TH4 4.8 11.8 55.4
141637 10873 2018/3/30 1:00 TH4 2.0 11.4 47.4
141638 10874 2018/3/30 2:00 TH4 4.2 13.5 51.4
141639 10875 2018/3/30 3:00 TH4 3.1 13.8 45.8
141640 10876 2018/3/30 4:00 TH4 5.0 12.6 45.4
141641 10877 2018/3/30 5:00 TH4 4.4 13.1 45.6
141642 10878 2018/3/30 6:00 TH4 8.2 14.6 47.4
141643 10879 2018/3/30 7:00 TH4 8.2 18.3 36.6
141644 10880 2018/3/30 8:00 TH4 9.0 15.6 35.0
141645 10881 2018/3/30 9:00 TH4 9.1 17.7 38.0
141646 10882 2018/3/30 10:00 TH4 10.3 12.7 35.3
141647 10883 2018/3/30 11:00 TH4 12.0 13.2 29.4
141648 10884 2018/3/30 12:00 TH4 9.1 13.5 26.2
141649 10885 2018/3/30 13:00 TH4 7.7 15.0 35.3
141650 10886 2018/3/30 14:00 TH4 7.0 12.2 36.6
141651 10887 2018/3/30 15:00 TH4 6.4 11.0 30.9
141652 10888 2018/3/30 16:00 TH4 6.4 14.0 39.8
141653 10889 2018/3/30 17:00 TH4 14.4 16.6 55.2
141654 10890 2018/3/30 18:00 TH4 11.2 18.8 63.3
141655 10891 2018/3/30 19:00 TH4 6.3 16.1 67.7
141656 10892 2018/3/30 20:00 TH4 3.5 11.2 44.3
141657 10893 2018/3/30 21:00 TH4 4.7 12.3 52.8
141658 10894 2018/3/30 22:00 TH4 5.4 14.0 54.7
141659 10895 2018/3/30 23:00 TH4 8.9 16.5 47.0
141660 10896 2018/3/31 0:00 TH4 NaN NaN NaN

141661 rows × 6 columns

In [8]:
Lodon_aqi.isnull().sum()
Out[8]:
Unnamed: 0                0
MeasurementDateGMT        0
station_id                0
PM2.5 (ug/m3)         18676
PM10 (ug/m3)          14553
NO2 (ug/m3)           25445
dtype: int64
In [96]:
#import xlrd
import csv



import pandas as pd


def xlsx_to_csv_pd():
    data_xls = pd.read_excel(path+'Beijing_AirQuality_Stations_en.xlsx', index_col=0)
    data_xls.to_csv('Beijing_AirQuality_Stations.csv', encoding='utf-8')


if __name__ == '__main__':
    xlsx_to_csv_pd()
    
    
In [10]:
len(Lodon_aqi)
Out[10]:
141661
In [11]:
import re
re.split('/|:| ',Lodon_aqi['MeasurementDateGMT'][0])
Out[11]:
['2017', '1', '1', '0', '00']
In [12]:
pd.read_csv(path+"London_historical_aqi_other_stations_20180331.csv")
/home/paslab/.local/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2785: DtypeWarning: Columns (0,1) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
Out[12]:
Station_ID MeasurementDateGMT PM2.5 (ug/m3) PM10 (ug/m3) NO2 (ug/m3) Unnamed: 5 Unnamed: 6
0 LH0 2017/1/1 0:00 30.2 34.6 15.9 NaN NaN
1 LH0 2017/1/1 1:00 25.4 29.2 11.8 NaN NaN
2 LH0 2017/1/1 2:00 24.7 28.1 11.6 NaN NaN
3 LH0 2017/1/1 3:00 23.6 27.0 13.0 NaN NaN
4 LH0 2017/1/1 4:00 24.2 27.4 27.1 NaN NaN
5 LH0 2017/1/1 5:00 22.8 26.0 22.9 NaN NaN
6 LH0 2017/1/1 6:00 21.6 24.8 26.8 NaN NaN
7 LH0 2017/1/1 7:00 19.9 23.1 39.4 NaN NaN
8 LH0 2017/1/1 8:00 18.3 21.3 41.6 NaN NaN
9 LH0 2017/1/1 9:00 16.3 19.5 44.1 NaN NaN
10 LH0 2017/1/1 10:00 13.3 16.2 49.1 NaN NaN
11 LH0 2017/1/1 11:00 9.4 11.6 45.2 NaN NaN
12 LH0 2017/1/1 12:00 6.1 8.5 41.4 NaN NaN
13 LH0 2017/1/1 13:00 6.7 13.4 53.6 NaN NaN
14 LH0 2017/1/1 14:00 2.1 4.6 11.7 NaN NaN
15 LH0 2017/1/1 15:00 0.9 2.0 12.1 NaN NaN
16 LH0 2017/1/1 16:00 1.1 2.1 12.0 NaN NaN
17 LH0 2017/1/1 17:00 1.0 2.0 13.5 NaN NaN
18 LH0 2017/1/1 18:00 1.5 2.5 14.6 NaN NaN
19 LH0 2017/1/1 19:00 2.1 3.6 14.7 NaN NaN
20 LH0 2017/1/1 20:00 2.9 4.8 15.5 NaN NaN
21 LH0 2017/1/1 21:00 4.5 6.9 14.0 NaN NaN
22 LH0 2017/1/1 22:00 4.9 7.8 11.6 NaN NaN
23 LH0 2017/1/1 23:00 5.9 9.3 11.8 NaN NaN
24 LH0 2017/1/2 0:00 5.1 8.4 9.6 NaN NaN
25 LH0 2017/1/2 1:00 4.5 7.7 6.9 NaN NaN
26 LH0 2017/1/2 2:00 4.3 7.7 7.3 NaN NaN
27 LH0 2017/1/2 3:00 4.9 8.4 6.5 NaN NaN
28 LH0 2017/1/2 4:00 6.7 10.8 7.1 NaN NaN
29 LH0 2017/1/2 5:00 7.0 13.1 15.3 NaN NaN
... ... ... ... ... ... ... ...
141603 NaN NaN NaN NaN NaN NaN NaN
141604 NaN NaN NaN NaN NaN NaN NaN
141605 NaN NaN NaN NaN NaN NaN NaN
141606 NaN NaN NaN NaN NaN NaN NaN
141607 NaN NaN NaN NaN NaN NaN NaN
141608 NaN NaN NaN NaN NaN NaN NaN
141609 NaN NaN NaN NaN NaN NaN NaN
141610 NaN NaN NaN NaN NaN NaN NaN
141611 NaN NaN NaN NaN NaN NaN NaN
141612 NaN NaN NaN NaN NaN NaN NaN
141613 NaN NaN NaN NaN NaN NaN NaN
141614 NaN NaN NaN NaN NaN NaN NaN
141615 NaN NaN NaN NaN NaN NaN NaN
141616 NaN NaN NaN NaN NaN NaN NaN
141617 NaN NaN NaN NaN NaN NaN NaN
141618 NaN NaN NaN NaN NaN NaN NaN
141619 NaN NaN NaN NaN NaN NaN NaN
141620 NaN NaN NaN NaN NaN NaN NaN
141621 NaN NaN NaN NaN NaN NaN NaN
141622 NaN NaN NaN NaN NaN NaN NaN
141623 NaN NaN NaN NaN NaN NaN NaN
141624 NaN NaN NaN NaN NaN NaN NaN
141625 NaN NaN NaN NaN NaN NaN NaN
141626 NaN NaN NaN NaN NaN NaN NaN
141627 NaN NaN NaN NaN NaN NaN NaN
141628 NaN NaN NaN NaN NaN NaN NaN
141629 NaN NaN NaN NaN NaN NaN NaN
141630 NaN NaN NaN NaN NaN NaN NaN
141631 NaN NaN NaN NaN NaN NaN NaN
141632 NaN NaN NaN NaN NaN NaN NaN

141633 rows × 7 columns

Geograhy

In [13]:
pd.read_csv(path+"London_grid_weather_station.csv")
Out[13]:
london_grid_000 50.5 -2
0 london_grid_001 50.6 -2.0
1 london_grid_002 50.7 -2.0
2 london_grid_003 50.8 -2.0
3 london_grid_004 50.9 -2.0
4 london_grid_005 51.0 -2.0
5 london_grid_006 51.1 -2.0
6 london_grid_007 51.2 -2.0
7 london_grid_008 51.3 -2.0
8 london_grid_009 51.4 -2.0
9 london_grid_010 51.5 -2.0
10 london_grid_011 51.6 -2.0
11 london_grid_012 51.7 -2.0
12 london_grid_013 51.8 -2.0
13 london_grid_014 51.9 -2.0
14 london_grid_015 52.0 -2.0
15 london_grid_016 52.1 -2.0
16 london_grid_017 52.2 -2.0
17 london_grid_018 52.3 -2.0
18 london_grid_019 52.4 -2.0
19 london_grid_020 52.5 -2.0
20 london_grid_021 50.5 -1.9
21 london_grid_022 50.6 -1.9
22 london_grid_023 50.7 -1.9
23 london_grid_024 50.8 -1.9
24 london_grid_025 50.9 -1.9
25 london_grid_026 51.0 -1.9
26 london_grid_027 51.1 -1.9
27 london_grid_028 51.2 -1.9
28 london_grid_029 51.3 -1.9
29 london_grid_030 51.4 -1.9
... ... ... ...
830 london_grid_831 51.7 1.9
831 london_grid_832 51.8 1.9
832 london_grid_833 51.9 1.9
833 london_grid_834 52.0 1.9
834 london_grid_835 52.1 1.9
835 london_grid_836 52.2 1.9
836 london_grid_837 52.3 1.9
837 london_grid_838 52.4 1.9
838 london_grid_839 52.5 1.9
839 london_grid_840 50.5 2.0
840 london_grid_841 50.6 2.0
841 london_grid_842 50.7 2.0
842 london_grid_843 50.8 2.0
843 london_grid_844 50.9 2.0
844 london_grid_845 51.0 2.0
845 london_grid_846 51.1 2.0
846 london_grid_847 51.2 2.0
847 london_grid_848 51.3 2.0
848 london_grid_849 51.4 2.0
849 london_grid_850 51.5 2.0
850 london_grid_851 51.6 2.0
851 london_grid_852 51.7 2.0
852 london_grid_853 51.8 2.0
853 london_grid_854 51.9 2.0
854 london_grid_855 52.0 2.0
855 london_grid_856 52.1 2.0
856 london_grid_857 52.2 2.0
857 london_grid_858 52.3 2.0
858 london_grid_859 52.4 2.0
859 london_grid_860 52.5 2.0

860 rows × 3 columns

In [14]:
pd.read_csv(path+"Beijing_grid_weather_station.csv")
Out[14]:
beijing_grid_000 39 115
0 beijing_grid_001 39.1 115.0
1 beijing_grid_002 39.2 115.0
2 beijing_grid_003 39.3 115.0
3 beijing_grid_004 39.4 115.0
4 beijing_grid_005 39.5 115.0
5 beijing_grid_006 39.6 115.0
6 beijing_grid_007 39.7 115.0
7 beijing_grid_008 39.8 115.0
8 beijing_grid_009 39.9 115.0
9 beijing_grid_010 40.0 115.0
10 beijing_grid_011 40.1 115.0
11 beijing_grid_012 40.2 115.0
12 beijing_grid_013 40.3 115.0
13 beijing_grid_014 40.4 115.0
14 beijing_grid_015 40.5 115.0
15 beijing_grid_016 40.6 115.0
16 beijing_grid_017 40.7 115.0
17 beijing_grid_018 40.8 115.0
18 beijing_grid_019 40.9 115.0
19 beijing_grid_020 41.0 115.0
20 beijing_grid_021 39.0 115.1
21 beijing_grid_022 39.1 115.1
22 beijing_grid_023 39.2 115.1
23 beijing_grid_024 39.3 115.1
24 beijing_grid_025 39.4 115.1
25 beijing_grid_026 39.5 115.1
26 beijing_grid_027 39.6 115.1
27 beijing_grid_028 39.7 115.1
28 beijing_grid_029 39.8 115.1
29 beijing_grid_030 39.9 115.1
... ... ... ...
620 beijing_grid_621 40.2 117.9
621 beijing_grid_622 40.3 117.9
622 beijing_grid_623 40.4 117.9
623 beijing_grid_624 40.5 117.9
624 beijing_grid_625 40.6 117.9
625 beijing_grid_626 40.7 117.9
626 beijing_grid_627 40.8 117.9
627 beijing_grid_628 40.9 117.9
628 beijing_grid_629 41.0 117.9
629 beijing_grid_630 39.0 118.0
630 beijing_grid_631 39.1 118.0
631 beijing_grid_632 39.2 118.0
632 beijing_grid_633 39.3 118.0
633 beijing_grid_634 39.4 118.0
634 beijing_grid_635 39.5 118.0
635 beijing_grid_636 39.6 118.0
636 beijing_grid_637 39.7 118.0
637 beijing_grid_638 39.8 118.0
638 beijing_grid_639 39.9 118.0
639 beijing_grid_640 40.0 118.0
640 beijing_grid_641 40.1 118.0
641 beijing_grid_642 40.2 118.0
642 beijing_grid_643 40.3 118.0
643 beijing_grid_644 40.4 118.0
644 beijing_grid_645 40.5 118.0
645 beijing_grid_646 40.6 118.0
646 beijing_grid_647 40.7 118.0
647 beijing_grid_648 40.8 118.0
648 beijing_grid_649 40.9 118.0
649 beijing_grid_650 41.0 118.0

650 rows × 3 columns

In [15]:
pd.read_csv(path+"London_grid_weather_station.csv")
Out[15]:
london_grid_000 50.5 -2
0 london_grid_001 50.6 -2.0
1 london_grid_002 50.7 -2.0
2 london_grid_003 50.8 -2.0
3 london_grid_004 50.9 -2.0
4 london_grid_005 51.0 -2.0
5 london_grid_006 51.1 -2.0
6 london_grid_007 51.2 -2.0
7 london_grid_008 51.3 -2.0
8 london_grid_009 51.4 -2.0
9 london_grid_010 51.5 -2.0
10 london_grid_011 51.6 -2.0
11 london_grid_012 51.7 -2.0
12 london_grid_013 51.8 -2.0
13 london_grid_014 51.9 -2.0
14 london_grid_015 52.0 -2.0
15 london_grid_016 52.1 -2.0
16 london_grid_017 52.2 -2.0
17 london_grid_018 52.3 -2.0
18 london_grid_019 52.4 -2.0
19 london_grid_020 52.5 -2.0
20 london_grid_021 50.5 -1.9
21 london_grid_022 50.6 -1.9
22 london_grid_023 50.7 -1.9
23 london_grid_024 50.8 -1.9
24 london_grid_025 50.9 -1.9
25 london_grid_026 51.0 -1.9
26 london_grid_027 51.1 -1.9
27 london_grid_028 51.2 -1.9
28 london_grid_029 51.3 -1.9
29 london_grid_030 51.4 -1.9
... ... ... ...
830 london_grid_831 51.7 1.9
831 london_grid_832 51.8 1.9
832 london_grid_833 51.9 1.9
833 london_grid_834 52.0 1.9
834 london_grid_835 52.1 1.9
835 london_grid_836 52.2 1.9
836 london_grid_837 52.3 1.9
837 london_grid_838 52.4 1.9
838 london_grid_839 52.5 1.9
839 london_grid_840 50.5 2.0
840 london_grid_841 50.6 2.0
841 london_grid_842 50.7 2.0
842 london_grid_843 50.8 2.0
843 london_grid_844 50.9 2.0
844 london_grid_845 51.0 2.0
845 london_grid_846 51.1 2.0
846 london_grid_847 51.2 2.0
847 london_grid_848 51.3 2.0
848 london_grid_849 51.4 2.0
849 london_grid_850 51.5 2.0
850 london_grid_851 51.6 2.0
851 london_grid_852 51.7 2.0
852 london_grid_853 51.8 2.0
853 london_grid_854 51.9 2.0
854 london_grid_855 52.0 2.0
855 london_grid_856 52.1 2.0
856 london_grid_857 52.2 2.0
857 london_grid_858 52.3 2.0
858 london_grid_859 52.4 2.0
859 london_grid_860 52.5 2.0

860 rows × 3 columns

In [16]:
pd.read_csv(path+"Beijing_historical_meo_grid.csv")
Out[16]:
stationName longitude latitude utc_time temperature pressure humidity wind_direction wind_speed/kph
0 beijing_grid_000 115.0 39.0 2017-01-01 00:00:00 -5.47 984.73 76.60 53.71 3.53
1 beijing_grid_001 115.0 39.1 2017-01-01 00:00:00 -5.53 979.33 75.40 43.59 3.11
2 beijing_grid_002 115.0 39.2 2017-01-01 00:00:00 -5.70 963.14 71.80 0.97 2.75
3 beijing_grid_003 115.0 39.3 2017-01-01 00:00:00 -5.88 946.94 68.20 327.65 3.84
4 beijing_grid_004 115.0 39.4 2017-01-01 00:00:00 -5.34 928.80 58.81 317.85 6.14
5 beijing_grid_005 115.0 39.5 2017-01-01 00:00:00 -4.81 910.66 49.43 313.46 8.52
6 beijing_grid_006 115.0 39.6 2017-01-01 00:00:00 -4.98 889.48 45.64 309.89 10.05
7 beijing_grid_007 115.0 39.7 2017-01-01 00:00:00 -5.49 866.77 44.66 306.63 11.16
8 beijing_grid_008 115.0 39.8 2017-01-01 00:00:00 -6.17 853.42 46.57 299.83 11.28
9 beijing_grid_009 115.0 39.9 2017-01-01 00:00:00 -7.17 858.79 54.29 281.74 9.98
10 beijing_grid_010 115.0 40.0 2017-01-01 00:00:00 -8.17 864.16 62.00 260.98 9.88
11 beijing_grid_011 115.0 40.1 2017-01-01 00:00:00 -6.76 876.30 57.20 247.27 11.31
12 beijing_grid_012 115.0 40.2 2017-01-01 00:00:00 -5.36 888.44 52.40 237.08 13.24
13 beijing_grid_013 115.0 40.3 2017-01-01 00:00:00 -4.82 900.98 51.40 239.28 11.96
14 beijing_grid_014 115.0 40.4 2017-01-01 00:00:00 -4.71 913.71 52.30 250.54 9.22
15 beijing_grid_015 115.0 40.5 2017-01-01 00:00:00 -4.75 922.50 52.79 269.04 6.78
16 beijing_grid_016 115.0 40.6 2017-01-01 00:00:00 -5.08 923.38 52.44 302.43 5.00
17 beijing_grid_017 115.0 40.7 2017-01-01 00:00:00 -5.41 924.27 52.10 343.16 5.72
18 beijing_grid_018 115.0 40.8 2017-01-01 00:00:00 -6.45 905.98 52.74 324.80 5.15
19 beijing_grid_019 115.0 40.9 2017-01-01 00:00:00 -7.48 887.69 53.39 304.51 5.19
20 beijing_grid_020 115.0 41.0 2017-01-01 00:00:00 -7.83 881.60 53.60 298.12 5.35
21 beijing_grid_021 115.1 39.0 2017-01-01 00:00:00 -5.45 987.94 77.35 58.56 3.64
22 beijing_grid_022 115.1 39.1 2017-01-01 00:00:00 -5.51 983.02 76.21 50.52 3.17
23 beijing_grid_023 115.1 39.2 2017-01-01 00:00:00 -5.71 968.24 72.78 11.10 2.40
24 beijing_grid_024 115.1 39.3 2017-01-01 00:00:00 -5.90 953.47 69.36 330.71 3.10
25 beijing_grid_025 115.1 39.4 2017-01-01 00:00:00 -5.46 934.70 60.79 318.81 5.34
26 beijing_grid_026 115.1 39.5 2017-01-01 00:00:00 -5.01 915.94 52.22 314.02 7.67
27 beijing_grid_027 115.1 39.6 2017-01-01 00:00:00 -5.04 896.41 47.78 310.47 9.33
28 beijing_grid_028 115.1 39.7 2017-01-01 00:00:00 -5.32 876.51 45.40 307.44 10.68
29 beijing_grid_029 115.1 39.8 2017-01-01 00:00:00 -5.86 863.21 46.22 300.78 11.01
... ... ... ... ... ... ... ... ... ...
7034676 beijing_grid_621 117.9 40.2 2018-03-27 05:00:00 22.69 985.53 31.77 201.38 14.97
7034677 beijing_grid_622 117.9 40.3 2018-03-27 05:00:00 21.88 970.68 31.89 199.32 14.71
7034678 beijing_grid_623 117.9 40.4 2018-03-27 05:00:00 21.03 950.44 31.57 197.29 14.03
7034679 beijing_grid_624 117.9 40.5 2018-03-27 05:00:00 20.76 938.21 30.39 196.51 13.90
7034680 beijing_grid_625 117.9 40.6 2018-03-27 05:00:00 21.66 942.01 27.50 198.39 14.87
7034681 beijing_grid_626 117.9 40.7 2018-03-27 05:00:00 22.55 945.81 24.61 200.04 15.85
7034682 beijing_grid_627 117.9 40.8 2018-03-27 05:00:00 22.66 944.54 23.21 208.02 17.13
7034683 beijing_grid_628 117.9 40.9 2018-03-27 05:00:00 22.78 943.28 21.82 214.79 18.69
7034684 beijing_grid_629 117.9 41.0 2018-03-27 05:00:00 22.82 942.85 21.35 216.79 19.27
7034685 beijing_grid_630 118.0 39.0 2018-03-27 05:00:00 12.92 1006.22 64.88 186.68 21.87
7034686 beijing_grid_631 118.0 39.1 2018-03-27 05:00:00 13.83 1006.12 62.22 187.76 21.84
7034687 beijing_grid_632 118.0 39.2 2018-03-27 05:00:00 16.54 1005.80 54.24 190.98 21.82
7034688 beijing_grid_633 118.0 39.3 2018-03-27 05:00:00 19.26 1005.48 46.25 194.20 21.87
7034689 beijing_grid_634 118.0 39.4 2018-03-27 05:00:00 22.16 1005.15 38.41 198.41 21.41
7034690 beijing_grid_635 118.0 39.5 2018-03-27 05:00:00 25.05 1004.83 30.56 202.79 21.08
7034691 beijing_grid_636 118.0 39.6 2018-03-27 05:00:00 25.77 1004.40 27.31 204.97 20.04
7034692 beijing_grid_637 118.0 39.7 2018-03-27 05:00:00 25.84 1003.93 26.35 206.15 18.61
7034693 beijing_grid_638 118.0 39.8 2018-03-27 05:00:00 25.67 1002.11 26.35 206.98 17.12
7034694 beijing_grid_639 118.0 39.9 2018-03-27 05:00:00 25.00 997.59 28.25 206.78 15.51
7034695 beijing_grid_640 118.0 40.0 2018-03-27 05:00:00 24.06 993.07 30.15 206.53 13.89
7034696 beijing_grid_641 118.0 40.1 2018-03-27 05:00:00 23.37 989.98 31.11 203.77 14.45
7034697 beijing_grid_642 118.0 40.2 2018-03-27 05:00:00 22.68 986.90 32.06 201.23 15.03
7034698 beijing_grid_643 118.0 40.3 2018-03-27 05:00:00 21.86 972.13 32.22 198.90 14.71
7034699 beijing_grid_644 118.0 40.4 2018-03-27 05:00:00 20.98 951.51 31.98 196.40 13.95
7034700 beijing_grid_645 118.0 40.5 2018-03-27 05:00:00 20.70 939.23 30.83 195.12 13.78
7034701 beijing_grid_646 118.0 40.6 2018-03-27 05:00:00 21.64 943.63 27.87 196.62 14.79
7034702 beijing_grid_647 118.0 40.7 2018-03-27 05:00:00 22.58 948.03 24.92 197.92 15.80
7034703 beijing_grid_648 118.0 40.8 2018-03-27 05:00:00 22.64 945.85 23.57 206.12 16.94
7034704 beijing_grid_649 118.0 40.9 2018-03-27 05:00:00 22.71 943.67 22.23 213.17 18.38
7034705 beijing_grid_650 118.0 41.0 2018-03-27 05:00:00 22.73 942.95 21.78 215.27 18.91

7034706 rows × 9 columns

From here is the geographic EDA

The station in Beijin

In [89]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
In [90]:
beijingAqCsv = pd.read_csv('../ml_dataset/2018_kdd_cup_dataset/beijing_17_18_aq.csv')
air_station_beij = beijingAqCsv.groupby("stationId", sort=False) 
In [91]:
air_station_beij.groups
Out[91]:
{'aotizhongxin_aq': Int64Index([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,
             ...
             8876, 8877, 8878, 8879, 8880, 8881, 8882, 8883, 8884, 8885],
            dtype='int64', length=8886),
 'badaling_aq': Int64Index([ 8886,  8887,  8888,  8889,  8890,  8891,  8892,  8893,  8894,
              8895,
             ...
             17762, 17763, 17764, 17765, 17766, 17767, 17768, 17769, 17770,
             17771],
            dtype='int64', length=8886),
 'beibuxinqu_aq': Int64Index([17772, 17773, 17774, 17775, 17776, 17777, 17778, 17779, 17780,
             17781,
             ...
             26648, 26649, 26650, 26651, 26652, 26653, 26654, 26655, 26656,
             26657],
            dtype='int64', length=8886),
 'daxing_aq': Int64Index([26658, 26659, 26660, 26661, 26662, 26663, 26664, 26665, 26666,
             26667,
             ...
             35534, 35535, 35536, 35537, 35538, 35539, 35540, 35541, 35542,
             35543],
            dtype='int64', length=8886),
 'dingling_aq': Int64Index([35544, 35545, 35546, 35547, 35548, 35549, 35550, 35551, 35552,
             35553,
             ...
             44420, 44421, 44422, 44423, 44424, 44425, 44426, 44427, 44428,
             44429],
            dtype='int64', length=8886),
 'donggaocun_aq': Int64Index([44430, 44431, 44432, 44433, 44434, 44435, 44436, 44437, 44438,
             44439,
             ...
             53306, 53307, 53308, 53309, 53310, 53311, 53312, 53313, 53314,
             53315],
            dtype='int64', length=8886),
 'dongsi_aq': Int64Index([53316, 53317, 53318, 53319, 53320, 53321, 53322, 53323, 53324,
             53325,
             ...
             62192, 62193, 62194, 62195, 62196, 62197, 62198, 62199, 62200,
             62201],
            dtype='int64', length=8886),
 'dongsihuan_aq': Int64Index([62202, 62203, 62204, 62205, 62206, 62207, 62208, 62209, 62210,
             62211,
             ...
             71078, 71079, 71080, 71081, 71082, 71083, 71084, 71085, 71086,
             71087],
            dtype='int64', length=8886),
 'fangshan_aq': Int64Index([71088, 71089, 71090, 71091, 71092, 71093, 71094, 71095, 71096,
             71097,
             ...
             79964, 79965, 79966, 79967, 79968, 79969, 79970, 79971, 79972,
             79973],
            dtype='int64', length=8886),
 'fengtaihuayuan_aq': Int64Index([79974, 79975, 79976, 79977, 79978, 79979, 79980, 79981, 79982,
             79983,
             ...
             88850, 88851, 88852, 88853, 88854, 88855, 88856, 88857, 88858,
             88859],
            dtype='int64', length=8886),
 'guanyuan_aq': Int64Index([88860, 88861, 88862, 88863, 88864, 88865, 88866, 88867, 88868,
             88869,
             ...
             97736, 97737, 97738, 97739, 97740, 97741, 97742, 97743, 97744,
             97745],
            dtype='int64', length=8886),
 'gucheng_aq': Int64Index([ 97746,  97747,  97748,  97749,  97750,  97751,  97752,  97753,
              97754,  97755,
             ...
             106622, 106623, 106624, 106625, 106626, 106627, 106628, 106629,
             106630, 106631],
            dtype='int64', length=8886),
 'huairou_aq': Int64Index([106632, 106633, 106634, 106635, 106636, 106637, 106638, 106639,
             106640, 106641,
             ...
             115508, 115509, 115510, 115511, 115512, 115513, 115514, 115515,
             115516, 115517],
            dtype='int64', length=8886),
 'liulihe_aq': Int64Index([115518, 115519, 115520, 115521, 115522, 115523, 115524, 115525,
             115526, 115527,
             ...
             124394, 124395, 124396, 124397, 124398, 124399, 124400, 124401,
             124402, 124403],
            dtype='int64', length=8886),
 'mentougou_aq': Int64Index([124404, 124405, 124406, 124407, 124408, 124409, 124410, 124411,
             124412, 124413,
             ...
             133280, 133281, 133282, 133283, 133284, 133285, 133286, 133287,
             133288, 133289],
            dtype='int64', length=8886),
 'miyun_aq': Int64Index([133290, 133291, 133292, 133293, 133294, 133295, 133296, 133297,
             133298, 133299,
             ...
             142166, 142167, 142168, 142169, 142170, 142171, 142172, 142173,
             142174, 142175],
            dtype='int64', length=8886),
 'miyunshuiku_aq': Int64Index([142176, 142177, 142178, 142179, 142180, 142181, 142182, 142183,
             142184, 142185,
             ...
             151052, 151053, 151054, 151055, 151056, 151057, 151058, 151059,
             151060, 151061],
            dtype='int64', length=8886),
 'nansanhuan_aq': Int64Index([151062, 151063, 151064, 151065, 151066, 151067, 151068, 151069,
             151070, 151071,
             ...
             159938, 159939, 159940, 159941, 159942, 159943, 159944, 159945,
             159946, 159947],
            dtype='int64', length=8886),
 'nongzhanguan_aq': Int64Index([159948, 159949, 159950, 159951, 159952, 159953, 159954, 159955,
             159956, 159957,
             ...
             168824, 168825, 168826, 168827, 168828, 168829, 168830, 168831,
             168832, 168833],
            dtype='int64', length=8886),
 'pingchang_aq': Int64Index([168834, 168835, 168836, 168837, 168838, 168839, 168840, 168841,
             168842, 168843,
             ...
             177710, 177711, 177712, 177713, 177714, 177715, 177716, 177717,
             177718, 177719],
            dtype='int64', length=8886),
 'pinggu_aq': Int64Index([177720, 177721, 177722, 177723, 177724, 177725, 177726, 177727,
             177728, 177729,
             ...
             186596, 186597, 186598, 186599, 186600, 186601, 186602, 186603,
             186604, 186605],
            dtype='int64', length=8886),
 'qianmen_aq': Int64Index([186606, 186607, 186608, 186609, 186610, 186611, 186612, 186613,
             186614, 186615,
             ...
             195482, 195483, 195484, 195485, 195486, 195487, 195488, 195489,
             195490, 195491],
            dtype='int64', length=8886),
 'shunyi_aq': Int64Index([195492, 195493, 195494, 195495, 195496, 195497, 195498, 195499,
             195500, 195501,
             ...
             204368, 204369, 204370, 204371, 204372, 204373, 204374, 204375,
             204376, 204377],
            dtype='int64', length=8886),
 'tiantan_aq': Int64Index([204378, 204379, 204380, 204381, 204382, 204383, 204384, 204385,
             204386, 204387,
             ...
             213254, 213255, 213256, 213257, 213258, 213259, 213260, 213261,
             213262, 213263],
            dtype='int64', length=8886),
 'tongzhou_aq': Int64Index([213264, 213265, 213266, 213267, 213268, 213269, 213270, 213271,
             213272, 213273,
             ...
             222140, 222141, 222142, 222143, 222144, 222145, 222146, 222147,
             222148, 222149],
            dtype='int64', length=8886),
 'wanliu_aq': Int64Index([222150, 222151, 222152, 222153, 222154, 222155, 222156, 222157,
             222158, 222159,
             ...
             231026, 231027, 231028, 231029, 231030, 231031, 231032, 231033,
             231034, 231035],
            dtype='int64', length=8886),
 'wanshouxigong_aq': Int64Index([231036, 231037, 231038, 231039, 231040, 231041, 231042, 231043,
             231044, 231045,
             ...
             239912, 239913, 239914, 239915, 239916, 239917, 239918, 239919,
             239920, 239921],
            dtype='int64', length=8886),
 'xizhimenbei_aq': Int64Index([239922, 239923, 239924, 239925, 239926, 239927, 239928, 239929,
             239930, 239931,
             ...
             248798, 248799, 248800, 248801, 248802, 248803, 248804, 248805,
             248806, 248807],
            dtype='int64', length=8886),
 'yanqin_aq': Int64Index([248808, 248809, 248810, 248811, 248812, 248813, 248814, 248815,
             248816, 248817,
             ...
             257684, 257685, 257686, 257687, 257688, 257689, 257690, 257691,
             257692, 257693],
            dtype='int64', length=8886),
 'yizhuang_aq': Int64Index([257694, 257695, 257696, 257697, 257698, 257699, 257700, 257701,
             257702, 257703,
             ...
             266570, 266571, 266572, 266573, 266574, 266575, 266576, 266577,
             266578, 266579],
            dtype='int64', length=8886),
 'yongdingmennei_aq': Int64Index([266580, 266581, 266582, 266583, 266584, 266585, 266586, 266587,
             266588, 266589,
             ...
             275456, 275457, 275458, 275459, 275460, 275461, 275462, 275463,
             275464, 275465],
            dtype='int64', length=8886),
 'yongledian_aq': Int64Index([275466, 275467, 275468, 275469, 275470, 275471, 275472, 275473,
             275474, 275475,
             ...
             284342, 284343, 284344, 284345, 284346, 284347, 284348, 284349,
             284350, 284351],
            dtype='int64', length=8886),
 'yufa_aq': Int64Index([284352, 284353, 284354, 284355, 284356, 284357, 284358, 284359,
             284360, 284361,
             ...
             293228, 293229, 293230, 293231, 293232, 293233, 293234, 293235,
             293236, 293237],
            dtype='int64', length=8886),
 'yungang_aq': Int64Index([293238, 293239, 293240, 293241, 293242, 293243, 293244, 293245,
             293246, 293247,
             ...
             302114, 302115, 302116, 302117, 302118, 302119, 302120, 302121,
             302122, 302123],
            dtype='int64', length=8886),
 'zhiwuyuan_aq': Int64Index([302124, 302125, 302126, 302127, 302128, 302129, 302130, 302131,
             302132, 302133,
             ...
             311000, 311001, 311002, 311003, 311004, 311005, 311006, 311007,
             311008, 311009],
            dtype='int64', length=8886)}
In [97]:
beijingAqCsv_sta = pd.read_csv('Beijing_AirQuality_Stations.csv')
In [98]:
beijingAqCsv_sta
Out[98]:
Pollutant Species Unnamed: 1 Unnamed: 2
0 type Unit NaN
1 PM2.5 µg/m3 (microgram/cubic meter) NaN
2 PM10 µg/m3 (microgram/cubic meter) NaN
3 SO2 µg/m3 (microgram/cubic meter) NaN
4 NO2 µg/m3 (microgram/cubic meter) NaN
5 O3 µg/m3 (microgram/cubic meter) NaN
6 CO µg/m3 (microgram/cubic meter) NaN
7 NaN NaN NaN
8 Stations at Beijing NaN NaN
9 Station ID longitude latitude
10 Urban Stations NaN NaN
11 dongsi_aq 116.417 39.929
12 tiantan_aq 116.407 39.886
13 guanyuan_aq 116.339 39.929
14 wanshouxigong_aq 116.352 39.878
15 aotizhongxin_aq 116.397 39.982
16 nongzhanguan_aq 116.461 39.937
17 wanliu_aq 116.287 39.987
18 beibuxinqu_aq 116.174 40.09
19 zhiwuyuan_aq 116.207 40.002
20 fengtaihuayuan_aq 116.279 39.863
21 yungang_aq 116.146 39.824
22 gucheng_aq 116.184 39.914
23 NaN NaN NaN
24 Suburban Stations NaN NaN
25 fangshan_aq 116.136 39.742
26 daxing_aq 116.404 39.718
27 yizhuang_aq 116.506 39.795
28 tongzhou_aq 116.663 39.886
29 shunyi_aq 116.655 40.127
30 pingchang_aq 116.23 40.217
31 mentougou_aq 116.106 39.937
32 pinggu_aq 117.1 40.143
33 huairou_aq 116.628 40.328
34 miyun_aq 116.832 40.37
35 yanqin_aq 115.972 40.453
36 NaN NaN NaN
37 Other Stations NaN NaN
38 dingling_aq 116.22 40.292
39 badaling_aq 115.988 40.365
40 miyunshuiku_aq 116.911 40.499
41 donggaocun_aq 117.12 40.1
42 yongledian_aq 116.783 39.712
43 yufa_aq 116.3 39.52
44 liulihe_aq 116 39.58
45 NaN NaN NaN
46 Stations Near Traffic NaN NaN
47 qianmen_aq 116.395 39.899
48 yongdingmennei_aq 116.394 39.876
49 xizhimenbei_aq 116.349 39.954
50 nansanhuan_aq 116.368 39.856
51 dongsihuan_aq 116.483 39.939
In [99]:
import folium
In [100]:
list(air_station_beij.groups.keys())
Out[100]:
['dongsihuan_aq',
 'wanshouxigong_aq',
 'dingling_aq',
 'mentougou_aq',
 'yufa_aq',
 'badaling_aq',
 'beibuxinqu_aq',
 'guanyuan_aq',
 'miyun_aq',
 'huairou_aq',
 'wanliu_aq',
 'aotizhongxin_aq',
 'nongzhanguan_aq',
 'yungang_aq',
 'zhiwuyuan_aq',
 'dongsi_aq',
 'yongledian_aq',
 'tiantan_aq',
 'qianmen_aq',
 'liulihe_aq',
 'pinggu_aq',
 'yongdingmennei_aq',
 'nansanhuan_aq',
 'fangshan_aq',
 'fengtaihuayuan_aq',
 'daxing_aq',
 'gucheng_aq',
 'xizhimenbei_aq',
 'yizhuang_aq',
 'pingchang_aq',
 'tongzhou_aq',
 'yanqin_aq',
 'donggaocun_aq',
 'shunyi_aq',
 'miyunshuiku_aq']
In [167]:
location_beijin={}
In [168]:
for i in range(0,len(beijingAqCsv_sta)):
    temp=beijingAqCsv_sta.iloc[i,:]
    if temp['Pollutant Species'] in list(air_station_beij.groups.keys()):
       location_beijin[temp['Pollutant Species']]=temp.values[1:].astype(float).tolist()[::-1]
In [169]:
location_beijin
Out[169]:
{'aotizhongxin_aq': [39.982, 116.397],
 'badaling_aq': [40.365, 115.988],
 'beibuxinqu_aq': [40.09, 116.174],
 'daxing_aq': [39.718, 116.404],
 'dingling_aq': [40.292, 116.22],
 'donggaocun_aq': [40.1, 117.12],
 'dongsi_aq': [39.929, 116.417],
 'dongsihuan_aq': [39.939, 116.483],
 'fangshan_aq': [39.742, 116.136],
 'fengtaihuayuan_aq': [39.863, 116.279],
 'guanyuan_aq': [39.929, 116.339],
 'gucheng_aq': [39.914, 116.184],
 'huairou_aq': [40.328, 116.628],
 'liulihe_aq': [39.58, 116.0],
 'mentougou_aq': [39.937, 116.106],
 'miyun_aq': [40.37, 116.832],
 'miyunshuiku_aq': [40.499, 116.911],
 'nansanhuan_aq': [39.856, 116.368],
 'nongzhanguan_aq': [39.937, 116.461],
 'pingchang_aq': [40.217, 116.23],
 'pinggu_aq': [40.143, 117.1],
 'qianmen_aq': [39.899, 116.395],
 'shunyi_aq': [40.127, 116.655],
 'tiantan_aq': [39.886, 116.407],
 'tongzhou_aq': [39.886, 116.663],
 'wanliu_aq': [39.987, 116.287],
 'wanshouxigong_aq': [39.878, 116.352],
 'xizhimenbei_aq': [39.954, 116.349],
 'yanqin_aq': [40.453, 115.972],
 'yizhuang_aq': [39.795, 116.506],
 'yongdingmennei_aq': [39.876, 116.394],
 'yongledian_aq': [39.712, 116.783],
 'yufa_aq': [39.52, 116.3],
 'yungang_aq': [39.824, 116.146],
 'zhiwuyuan_aq': [40.002, 116.207]}
In [157]:
#del temp
temp=beijingAqCsv_sta.loc[beijingAqCsv_sta['Pollutant Species']==list(air_station_beij.groups.keys())[0]].iloc[:,1:].values[0]
In [158]:
temp
Out[158]:
array(['116.483', '39.939'], dtype=object)
In [159]:
temp=temp.astype(float).tolist()
In [160]:
temp=temp[::-1]
In [165]:
map_beijin_1 = folium.Map(location=temp, zoom_start=9)
In [170]:
for key in location_beijin:
    folium.Marker(location=location_beijin[key]).add_to(map_beijin_1)
map_beijin_1
Out[170]:
In [163]:
temp=beijingAqCsv_sta.loc[beijingAqCsv_sta['Pollutant Species']==list(air_station_beij.groups.keys())[0]].iloc[:,1:].values[0]
temp=temp.astype(float).tolist()
temp=temp[::-1]
map_beijin_1 = folium.Map(location=temp, zoom_start=9,tiles='Stamen Terrain')
for key in location_beji:
    folium.Marker(location=location_beji[key]).add_to(map_beijin_1)
map_beijin_1
Out[163]:
In [33]:
from folium import plugins
from folium.plugins import HeatMap

beijing grid

In [75]:
beijing_grid_sta=pd.read_csv(path+"Beijing_historical_meo_grid.csv")
In [76]:
beijing_grid_sta
Out[76]:
stationName longitude latitude utc_time temperature pressure humidity wind_direction wind_speed/kph
0 beijing_grid_000 115.0 39.0 2017-01-01 00:00:00 -5.47 984.73 76.60 53.71 3.53
1 beijing_grid_001 115.0 39.1 2017-01-01 00:00:00 -5.53 979.33 75.40 43.59 3.11
2 beijing_grid_002 115.0 39.2 2017-01-01 00:00:00 -5.70 963.14 71.80 0.97 2.75
3 beijing_grid_003 115.0 39.3 2017-01-01 00:00:00 -5.88 946.94 68.20 327.65 3.84
4 beijing_grid_004 115.0 39.4 2017-01-01 00:00:00 -5.34 928.80 58.81 317.85 6.14
5 beijing_grid_005 115.0 39.5 2017-01-01 00:00:00 -4.81 910.66 49.43 313.46 8.52
6 beijing_grid_006 115.0 39.6 2017-01-01 00:00:00 -4.98 889.48 45.64 309.89 10.05
7 beijing_grid_007 115.0 39.7 2017-01-01 00:00:00 -5.49 866.77 44.66 306.63 11.16
8 beijing_grid_008 115.0 39.8 2017-01-01 00:00:00 -6.17 853.42 46.57 299.83 11.28
9 beijing_grid_009 115.0 39.9 2017-01-01 00:00:00 -7.17 858.79 54.29 281.74 9.98
10 beijing_grid_010 115.0 40.0 2017-01-01 00:00:00 -8.17 864.16 62.00 260.98 9.88
11 beijing_grid_011 115.0 40.1 2017-01-01 00:00:00 -6.76 876.30 57.20 247.27 11.31
12 beijing_grid_012 115.0 40.2 2017-01-01 00:00:00 -5.36 888.44 52.40 237.08 13.24
13 beijing_grid_013 115.0 40.3 2017-01-01 00:00:00 -4.82 900.98 51.40 239.28 11.96
14 beijing_grid_014 115.0 40.4 2017-01-01 00:00:00 -4.71 913.71 52.30 250.54 9.22
15 beijing_grid_015 115.0 40.5 2017-01-01 00:00:00 -4.75 922.50 52.79 269.04 6.78
16 beijing_grid_016 115.0 40.6 2017-01-01 00:00:00 -5.08 923.38 52.44 302.43 5.00
17 beijing_grid_017 115.0 40.7 2017-01-01 00:00:00 -5.41 924.27 52.10 343.16 5.72
18 beijing_grid_018 115.0 40.8 2017-01-01 00:00:00 -6.45 905.98 52.74 324.80 5.15
19 beijing_grid_019 115.0 40.9 2017-01-01 00:00:00 -7.48 887.69 53.39 304.51 5.19
20 beijing_grid_020 115.0 41.0 2017-01-01 00:00:00 -7.83 881.60 53.60 298.12 5.35
21 beijing_grid_021 115.1 39.0 2017-01-01 00:00:00 -5.45 987.94 77.35 58.56 3.64
22 beijing_grid_022 115.1 39.1 2017-01-01 00:00:00 -5.51 983.02 76.21 50.52 3.17
23 beijing_grid_023 115.1 39.2 2017-01-01 00:00:00 -5.71 968.24 72.78 11.10 2.40
24 beijing_grid_024 115.1 39.3 2017-01-01 00:00:00 -5.90 953.47 69.36 330.71 3.10
25 beijing_grid_025 115.1 39.4 2017-01-01 00:00:00 -5.46 934.70 60.79 318.81 5.34
26 beijing_grid_026 115.1 39.5 2017-01-01 00:00:00 -5.01 915.94 52.22 314.02 7.67
27 beijing_grid_027 115.1 39.6 2017-01-01 00:00:00 -5.04 896.41 47.78 310.47 9.33
28 beijing_grid_028 115.1 39.7 2017-01-01 00:00:00 -5.32 876.51 45.40 307.44 10.68
29 beijing_grid_029 115.1 39.8 2017-01-01 00:00:00 -5.86 863.21 46.22 300.78 11.01
... ... ... ... ... ... ... ... ... ...
7034676 beijing_grid_621 117.9 40.2 2018-03-27 05:00:00 22.69 985.53 31.77 201.38 14.97
7034677 beijing_grid_622 117.9 40.3 2018-03-27 05:00:00 21.88 970.68 31.89 199.32 14.71
7034678 beijing_grid_623 117.9 40.4 2018-03-27 05:00:00 21.03 950.44 31.57 197.29 14.03
7034679 beijing_grid_624 117.9 40.5 2018-03-27 05:00:00 20.76 938.21 30.39 196.51 13.90
7034680 beijing_grid_625 117.9 40.6 2018-03-27 05:00:00 21.66 942.01 27.50 198.39 14.87
7034681 beijing_grid_626 117.9 40.7 2018-03-27 05:00:00 22.55 945.81 24.61 200.04 15.85
7034682 beijing_grid_627 117.9 40.8 2018-03-27 05:00:00 22.66 944.54 23.21 208.02 17.13
7034683 beijing_grid_628 117.9 40.9 2018-03-27 05:00:00 22.78 943.28 21.82 214.79 18.69
7034684 beijing_grid_629 117.9 41.0 2018-03-27 05:00:00 22.82 942.85 21.35 216.79 19.27
7034685 beijing_grid_630 118.0 39.0 2018-03-27 05:00:00 12.92 1006.22 64.88 186.68 21.87
7034686 beijing_grid_631 118.0 39.1 2018-03-27 05:00:00 13.83 1006.12 62.22 187.76 21.84
7034687 beijing_grid_632 118.0 39.2 2018-03-27 05:00:00 16.54 1005.80 54.24 190.98 21.82
7034688 beijing_grid_633 118.0 39.3 2018-03-27 05:00:00 19.26 1005.48 46.25 194.20 21.87
7034689 beijing_grid_634 118.0 39.4 2018-03-27 05:00:00 22.16 1005.15 38.41 198.41 21.41
7034690 beijing_grid_635 118.0 39.5 2018-03-27 05:00:00 25.05 1004.83 30.56 202.79 21.08
7034691 beijing_grid_636 118.0 39.6 2018-03-27 05:00:00 25.77 1004.40 27.31 204.97 20.04
7034692 beijing_grid_637 118.0 39.7 2018-03-27 05:00:00 25.84 1003.93 26.35 206.15 18.61
7034693 beijing_grid_638 118.0 39.8 2018-03-27 05:00:00 25.67 1002.11 26.35 206.98 17.12
7034694 beijing_grid_639 118.0 39.9 2018-03-27 05:00:00 25.00 997.59 28.25 206.78 15.51
7034695 beijing_grid_640 118.0 40.0 2018-03-27 05:00:00 24.06 993.07 30.15 206.53 13.89
7034696 beijing_grid_641 118.0 40.1 2018-03-27 05:00:00 23.37 989.98 31.11 203.77 14.45
7034697 beijing_grid_642 118.0 40.2 2018-03-27 05:00:00 22.68 986.90 32.06 201.23 15.03
7034698 beijing_grid_643 118.0 40.3 2018-03-27 05:00:00 21.86 972.13 32.22 198.90 14.71
7034699 beijing_grid_644 118.0 40.4 2018-03-27 05:00:00 20.98 951.51 31.98 196.40 13.95
7034700 beijing_grid_645 118.0 40.5 2018-03-27 05:00:00 20.70 939.23 30.83 195.12 13.78
7034701 beijing_grid_646 118.0 40.6 2018-03-27 05:00:00 21.64 943.63 27.87 196.62 14.79
7034702 beijing_grid_647 118.0 40.7 2018-03-27 05:00:00 22.58 948.03 24.92 197.92 15.80
7034703 beijing_grid_648 118.0 40.8 2018-03-27 05:00:00 22.64 945.85 23.57 206.12 16.94
7034704 beijing_grid_649 118.0 40.9 2018-03-27 05:00:00 22.71 943.67 22.23 213.17 18.38
7034705 beijing_grid_650 118.0 41.0 2018-03-27 05:00:00 22.73 942.95 21.78 215.27 18.91

7034706 rows × 9 columns

In [108]:
for i in set(time):
    print(len(beijing_grid_sta.iloc[beijing_grid_sta.groupby(['utc_time']).groups[i]]))
651
651
651
651
651
651
651
651
651
651
651
651
651
651
651
651
In [77]:
beiji_grid={}
In [78]:
#faster in kmeans
from sklearn.cluster import MiniBatchKMeans
kmeans = MiniBatchKMeans(n_clusters=len(location_beji), batch_size=1000).fit(beijing_grid_sta[['latitude','longitude']])
beijing_grid_sta.loc[:, 'label'] = kmeans.labels_
In [79]:
map_beijing_2 = folium.Map(location=beijing_grid_sta[['latitude','longitude']].iloc[0].values.tolist(),
                   zoom_start=9)
for label in kmeans.cluster_centers_:
    folium.Marker(location=[label][0]).add_to(map_beijing_2)

map_beijing_2
Out[79]:
In [105]:
map_heat_ground_beijing_1 = folium.Map(location=beijing_grid_sta[['latitude','longitude']].iloc[0].values.tolist(),
                    zoom_start =6, attr='USGS style') 
In [81]:
import folium
from folium import plugins
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os

#m = folium.Map([,], control_scale = True, zoom_start=11)

#plugins.HeatMap(data, radius = 20, min_opacity = 0.1, max_val = 50,gradient={.2: 'blue', .5: 'lime', 1: 'red'}).add_to(m)
In [82]:
lats = np.array([float(row['latitude'])for index, row in beijing_grid_sta.iloc[:1000].iterrows() ])
In [83]:
lats 
Out[83]:
array([39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. ,
       40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. ,
       39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1,
       40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1,
       39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2,
       40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2,
       39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3,
       40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3,
       39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4,
       40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4,
       39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5,
       40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5,
       39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6,
       40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6,
       39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7,
       40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7,
       39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8,
       40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8,
       39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9,
       41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9,
       40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. ,
       39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. ,
       40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. ,
       39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1,
       40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1,
       39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2,
       40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2,
       39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3,
       40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3,
       39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4,
       40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4,
       39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5,
       40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5,
       39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6,
       40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6,
       39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7,
       40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7,
       39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8,
       40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8,
       39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9,
       41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9,
       40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. ,
       39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. ,
       40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. ,
       39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1,
       40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1,
       39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2,
       40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2,
       39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3,
       40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3,
       39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4,
       40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4,
       39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5,
       40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5,
       39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6,
       40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6,
       39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7,
       40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7,
       39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8,
       40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8,
       39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9,
       41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9,
       40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. ,
       39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. ,
       40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. ,
       39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1,
       40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1,
       39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2,
       40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2,
       39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3,
       40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3,
       39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4,
       40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4,
       39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5,
       40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5,
       39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6,
       40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6,
       39.7, 39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7,
       40.8, 40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7,
       39.8, 39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8,
       40.9, 41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8,
       39.9, 40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9,
       41. , 39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9,
       40. , 40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. ,
       39. , 39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. ,
       40.1, 40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. ,
       39.1, 39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1,
       40.2, 40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1,
       39.2, 39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2,
       40.3, 40.4, 40.5, 40.6, 40.7, 40.8, 40.9, 41. , 39. , 39.1, 39.2,
       39.3, 39.4, 39.5, 39.6, 39.7, 39.8, 39.9, 40. , 40.1, 40.2])
In [84]:
#prevent too large
lats = np.array([float(row['latitude'])for index, row in beijing_grid_sta.iloc[:10000].iterrows() ])
lons = np.array([float(row['longitude']) for index, row in beijing_grid_sta.iloc[:10000].iterrows() ])
mag = np.array([float(row['temperature']) for index, row in beijing_grid_sta.iloc[:10000].iterrows() ])
time=np.array([row['utc_time'] for index, row in beijing_grid_sta.iloc[:10000].iterrows() ])
data=np.vstack((lats,lons,mag,time))
data=data.T
data[:,:3]
Out[84]:
array([['39.0', '115.0', '-5.47'],
       ['39.1', '115.0', '-5.53'],
       ['39.2', '115.0', '-5.7'],
       ...,
       ['39.1', '116.1', '-2.79'],
       ['39.2', '116.1', '-2.83'],
       ['39.3', '116.1', '-2.88']], dtype='<U32')
In [85]:
set(time)
Out[85]:
{'2017-01-01 00:00:00',
 '2017-01-01 01:00:00',
 '2017-01-01 02:00:00',
 '2017-01-01 03:00:00',
 '2017-01-01 04:00:00',
 '2017-01-01 05:00:00',
 '2017-01-01 06:00:00',
 '2017-01-01 07:00:00',
 '2017-01-01 08:00:00',
 '2017-01-01 09:00:00',
 '2017-01-01 10:00:00',
 '2017-01-01 11:00:00',
 '2017-01-01 12:00:00',
 '2017-01-01 13:00:00',
 '2017-01-01 14:00:00',
 '2017-01-01 15:00:00'}
In [86]:
list(set(time))
Out[86]:
['2017-01-01 00:00:00',
 '2017-01-01 07:00:00',
 '2017-01-01 09:00:00',
 '2017-01-01 15:00:00',
 '2017-01-01 14:00:00',
 '2017-01-01 04:00:00',
 '2017-01-01 10:00:00',
 '2017-01-01 01:00:00',
 '2017-01-01 03:00:00',
 '2017-01-01 11:00:00',
 '2017-01-01 05:00:00',
 '2017-01-01 06:00:00',
 '2017-01-01 08:00:00',
 '2017-01-01 12:00:00',
 '2017-01-01 13:00:00',
 '2017-01-01 02:00:00']
In [97]:
init_=[]
for i in range(0,len(set(time))):
    init_.append([])
init_
Out[97]:
[[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]
In [98]:
for i in range(0,len(data)):
    index_value=list(set(time)).index(data[i,3])
    init_[index_value].append(data[i,:3].astype(float).tolist())
In [99]:
len(init_)
Out[99]:
16
In [106]:
# List comprehension to make out list of lists



colormap ={.4: 'blue', .65: 'lime', 1: 'red'}

# Plot it on the map
hm = plugins.HeatMapWithTime(init_,radius = 15)
hm.add_to(map_heat_ground_beijing_1)
#plugins.HeatMap(init_, radius = 20,gradient=colormap).add_to(map_heat_ground_1_2)


# Display the map
map_heat_ground_beijing_1
Out[106]:

The station in London

In [132]:
LondonAqCsv_sta = pd.read_csv(path+'London_AirQuality_Stations.csv')
In [133]:
LondonAqCsv_sta
Out[133]:
Unnamed: 0 api_data need_prediction historical_data Latitude Longitude SiteType SiteName
0 BX9 True NaN True 51.465983 0.184877 Suburban Bexley - Slade Green FDMS
1 BX1 True NaN True 51.465983 0.184877 Suburban Bexley - Slade Green
2 BL0 True True True 51.522287 -0.125848 Urban Background Camden - Bloomsbury
3 CD9 True True True 51.527707 -0.129053 Roadside Camden - Euston Road
4 CD1 True True True 51.544219 -0.175284 Kerbside Camden - Swiss Cottage
5 CT2 True NaN True 51.514525 -0.104516 Kerbside City of London - Farringdon Street
6 CT3 True NaN True 51.513847 -0.077766 Urban Background City of London - Sir John Cass School
7 CR8 NaN NaN True 51.410039 -0.127523 Urban Background Croydon - Norbury Manor
8 GN0 True True True 51.490532 0.074003 Roadside Greenwich - A206 Burrage Grove
9 GR4 True True True 51.452580 0.070766 Suburban Greenwich - Eltham
10 GN3 True True True 51.486957 0.095111 Roadside Greenwich - Plumstead High Street
11 GR9 True True True 51.456357 0.040725 Roadside Greenwich - Westhorne Avenue
12 GB0 NaN NaN True 51.456300 0.085606 Roadside Greenwich and Bexley - Falconwood FDMS
13 HR1 NaN NaN True 51.617327 -0.298775 Urban Background Harrow - Stanmore
14 HV1 True True True 51.520787 0.205461 Roadside Havering - Rainham
15 LH0 NaN NaN True 51.488780 -0.441627 Urban Background Hillingdon - Harlington
16 KC1 NaN NaN True 51.521047 -0.213492 Urban Background Kensington and Chelsea - North Ken
17 KF1 True True True 51.521047 -0.213492 Urban Background Kensington and Chelsea - North Ken FIDAS
18 LW2 True True True 51.474954 -0.039641 Roadside Lewisham - New Cross
19 RB7 True NaN True 51.569484 0.082907 Urban Background Redbridge - Ley Street
20 TD5 True NaN True 51.425256 -0.345608 Suburban Richmond Upon Thames - Bushy Park
21 ST5 True True True 51.389287 -0.141662 Industrial Sutton - Beddington Lane north
22 TH4 True True True 51.515046 -0.008418 Roadside Tower Hamlets - Blackwall
23 MY7 True True True 51.522540 -0.154590 Kerbside Westminster - Marylebone Road FDMS
In [136]:
location_Lodon={}
for i in range(0,len(LondonAqCsv_sta)):
    temp_name=LondonAqCsv_sta['Unnamed: 0'].iloc[i]
    temp_Latitude=LondonAqCsv_sta['Latitude'].iloc[i]
    temp_Longitude=LondonAqCsv_sta['Longitude'].iloc[i]
    location_Lodon[temp_name]=[temp_Latitude,temp_Longitude]
In [137]:
location_Lodon
Out[137]:
{'BL0': [51.522287, -0.12584800000000002],
 'BX1': [51.46598327, 0.184877127],
 'BX9': [51.46598327, 0.184877127],
 'CD1': [51.544219, -0.175284],
 'CD9': [51.52770662, -0.129053205],
 'CR8': [51.410039000000005, -0.127523],
 'CT2': [51.51452534, -0.104515626],
 'CT3': [51.51384718, -0.077765682],
 'GB0': [51.4563, 0.08560599999999999],
 'GN0': [51.490532, 0.074003],
 'GN3': [51.486957000000004, 0.095111],
 'GR4': [51.45258, 0.070766],
 'GR9': [51.456357000000004, 0.040725],
 'HR1': [51.617327, -0.298775],
 'HV1': [51.52078746, 0.20546070600000002],
 'KC1': [51.52104675, -0.21349214],
 'KF1': [51.52104675, -0.21349214],
 'LH0': [51.48878, -0.44162700000000005],
 'LW2': [51.474954, -0.039641],
 'MY7': [51.52254, -0.15459],
 'RB7': [51.56948433, 0.082907475],
 'ST5': [51.3892869, -0.141661525],
 'TD5': [51.42525604, -0.345608291],
 'TH4': [51.51504617, -0.008418493]}
In [139]:
#del temp
col_list=['Latitude','Longitude']
temp_pd=LondonAqCsv_sta[col_list]
In [140]:
temp_pd.iloc[0].values.tolist()
Out[140]:
[51.46598327, 0.184877127]
In [46]:
map_london_1 = folium.Map(location=temp_pd.iloc[0].values.tolist(), zoom_start=9)
In [47]:
for key in location_Lodon:
    folium.Marker(location=location_Lodon[key]).add_to(map_london_1)
map_london_1
Out[47]:
In [141]:
map_london_1= folium.Map(location=temp_pd.iloc[0].values.tolist(), zoom_start=9,tiles='Stamen Terrain')
In [142]:
for key in location_Lodon:
    folium.Marker(location=location_Lodon[key]).add_to(map_london_1)
map_london_1
Out[142]:

Adding Time series EDA

In [81]:
beji_aqi_2018=pd.read_csv(path+"beijing_201802_201803_aq.csv")
beji_aqi_2017_2018=pd.read_csv(path+"beijing_17_18_aq.csv")
beji_aqi_summary=pd.concat([beji_aqi_2018,beji_aqi_2017_2018],axis=0).sort_values(by=['utc_time'])


beji_aqi_summary=beji_aqi_summary.set_index([list(range(0,len(beji_aqi_summary)))])
beji_aqi_summary
Out[81]:
stationId utc_time PM2.5 PM10 NO2 CO O3 SO2
0 yizhuang_aq 2017-01-01 14:00:00 278.0 362.0 117.0 5.7 6.0 2.0
1 tongzhou_aq 2017-01-01 14:00:00 376.0 409.0 128.0 5.1 2.0 9.0
2 pingchang_aq 2017-01-01 14:00:00 495.0 588.0 152.0 7.6 5.0 5.0
3 fengtaihuayuan_aq 2017-01-01 14:00:00 391.0 496.0 134.0 6.5 6.0 5.0
4 gucheng_aq 2017-01-01 14:00:00 500.0 612.0 161.0 7.7 3.0 11.0
5 tiantan_aq 2017-01-01 14:00:00 357.0 449.0 116.0 6.2 2.0 4.0
6 yanqin_aq 2017-01-01 14:00:00 206.0 227.0 84.0 4.4 43.0 45.0
7 aotizhongxin_aq 2017-01-01 14:00:00 453.0 467.0 156.0 7.2 3.0 9.0
8 miyun_aq 2017-01-01 14:00:00 465.0 607.0 121.0 NaN 2.0 5.0
9 donggaocun_aq 2017-01-01 14:00:00 377.0 NaN 123.0 5.5 2.0 6.0
10 beibuxinqu_aq 2017-01-01 14:00:00 479.0 487.0 166.0 7.4 4.0 9.0
11 yufa_aq 2017-01-01 14:00:00 285.0 322.0 87.0 5.2 5.0 14.0
12 liulihe_aq 2017-01-01 14:00:00 376.0 447.0 116.0 5.5 2.0 16.0
13 yungang_aq 2017-01-01 14:00:00 415.0 495.0 136.0 6.4 2.0 8.0
14 badaling_aq 2017-01-01 14:00:00 87.0 115.0 84.0 1.5 2.0 49.0
15 xizhimenbei_aq 2017-01-01 14:00:00 514.0 NaN 171.0 8.3 8.0 15.0
16 mentougou_aq 2017-01-01 14:00:00 564.0 679.0 173.0 8.3 10.0 19.0
17 shunyi_aq 2017-01-01 14:00:00 386.0 477.0 116.0 6.3 4.0 8.0
18 huairou_aq 2017-01-01 14:00:00 496.0 675.0 137.0 0.7 2.0 4.0
19 dongsihuan_aq 2017-01-01 14:00:00 390.0 394.0 99.0 6.8 4.0 11.0
20 wanshouxigong_aq 2017-01-01 14:00:00 416.0 474.0 140.0 0.6 7.0 5.0
21 wanliu_aq 2017-01-01 14:00:00 468.0 518.0 187.0 7.5 6.0 7.0
22 guanyuan_aq 2017-01-01 14:00:00 476.0 548.0 158.0 6.9 2.0 7.0
23 yongdingmennei_aq 2017-01-01 14:00:00 415.0 NaN 143.0 6.7 12.0 12.0
24 yongledian_aq 2017-01-01 14:00:00 329.0 NaN 130.0 5.5 6.0 12.0
25 zhiwuyuan_aq 2017-01-01 14:00:00 458.0 490.0 157.0 8.1 8.0 5.0
26 qianmen_aq 2017-01-01 14:00:00 436.0 NaN 157.0 6.8 2.0 5.0
27 nansanhuan_aq 2017-01-01 14:00:00 431.0 467.0 147.0 6.5 3.0 14.0
28 dongsi_aq 2017-01-01 14:00:00 469.0 594.0 136.0 8.9 2.0 7.0
29 dingling_aq 2017-01-01 14:00:00 339.0 372.0 137.0 5.9 6.0 18.0
... ... ... ... ... ... ... ... ...
360400 dongsihuan_aq 2018-03-31 15:00:00 154.0 221.0 158.0 1.8 2.0 13.0
360401 dongsi_aq 2018-03-31 15:00:00 178.0 233.0 140.0 1.8 15.0 12.0
360402 huairou_aq 2018-03-31 15:00:00 108.0 139.0 41.0 1.1 68.0 7.0
360403 pinggu_aq 2018-03-31 15:00:00 158.0 NaN 49.0 1.4 85.0 2.0
360404 wanliu_aq 2018-03-31 15:00:00 128.0 182.0 146.0 1.5 3.0 8.0
360405 gucheng_aq 2018-03-31 15:00:00 168.0 320.0 164.0 1.4 2.0 7.0
360406 yungang_aq 2018-03-31 15:00:00 152.0 202.0 53.0 1.0 95.0 18.0
360407 zhiwuyuan_aq 2018-03-31 15:00:00 NaN NaN NaN NaN NaN NaN
360408 shunyi_aq 2018-03-31 15:00:00 150.0 194.0 72.0 1.3 80.0 15.0
360409 xizhimenbei_aq 2018-03-31 15:00:00 183.0 229.0 165.0 2.2 2.0 17.0
360410 beibuxinqu_aq 2018-03-31 15:00:00 174.0 NaN 121.0 1.3 2.0 4.0
360411 yufa_aq 2018-03-31 15:00:00 227.0 268.0 76.0 1.6 47.0 11.0
360412 yanqin_aq 2018-03-31 15:00:00 121.0 307.0 118.0 1.1 12.0 8.0
360413 miyun_aq 2018-03-31 15:00:00 114.0 196.0 132.0 1.1 3.0 7.0
360414 dingling_aq 2018-03-31 15:00:00 101.0 138.0 44.0 0.9 64.0 6.0
360415 mentougou_aq 2018-03-31 15:00:00 121.0 175.0 88.0 1.0 47.0 7.0
360416 miyunshuiku_aq 2018-03-31 15:00:00 98.0 146.0 36.0 0.9 98.0 7.0
360417 aotizhongxin_aq 2018-03-31 15:00:00 118.0 255.0 174.0 1.7 2.0 10.0
360418 yongdingmennei_aq 2018-03-31 15:00:00 161.0 224.0 180.0 2.1 2.0 12.0
360419 nansanhuan_aq 2018-03-31 15:00:00 190.0 279.0 180.0 2.2 2.0 18.0
360420 tiantan_aq 2018-03-31 15:00:00 133.0 140.0 113.0 1.3 2.0 2.0
360421 daxing_aq 2018-03-31 15:00:00 166.0 251.0 159.0 1.7 2.0 11.0
360422 fangshan_aq 2018-03-31 15:00:00 160.0 289.0 172.0 1.4 2.0 11.0
360423 donggaocun_aq 2018-03-31 15:00:00 156.0 NaN 33.0 0.9 121.0 14.0
360424 fengtaihuayuan_aq 2018-03-31 15:00:00 158.0 256.0 178.0 1.9 3.0 12.0
360425 liulihe_aq 2018-03-31 15:00:00 136.0 204.0 61.0 1.1 50.0 NaN
360426 nongzhanguan_aq 2018-03-31 15:00:00 148.0 200.0 161.0 1.7 2.0 10.0
360427 yizhuang_aq 2018-03-31 15:00:00 137.0 192.0 166.0 1.6 5.0 9.0
360428 tongzhou_aq 2018-03-31 15:00:00 197.0 355.0 139.0 1.3 19.0 18.0
360429 guanyuan_aq 2018-03-31 15:00:00 139.0 191.0 151.0 2.1 2.0 8.0

360430 rows × 8 columns

In [82]:
beji_aqi_summary.isnull().sum()/len(beji_aqi_summary)
Out[82]:
stationId    0.000000
utc_time     0.000000
PM2.5        0.065086
PM10         0.266834
NO2          0.060261
CO           0.128025
O3           0.065844
SO2          0.060106
dtype: float64
In [83]:
beji_aqi_summary.groupby(['stationId']).groups
Out[83]:
{'aotizhongxin_aq': Int64Index([     7,     48,     96,    128,    167,    200,    219,    260,
                314,    329,
             ...
             360089, 360121, 360151, 360207, 360246, 360279, 360293, 360326,
             360393, 360417],
            dtype='int64', length=10298),
 'badaling_aq': Int64Index([    14,     68,     93,    117,    166,    179,    243,    252,
                292,    320,
             ...
             360099, 360128, 360180, 360191, 360221, 360287, 360323, 360334,
             360378, 360396],
            dtype='int64', length=10298),
 'beibuxinqu_aq': Int64Index([    10,     43,     94,    125,    149,    205,    216,    257,
                301,    338,
             ...
             360093, 360142, 360159, 360219, 360229, 360283, 360314, 360348,
             360390, 360410],
            dtype='int64', length=10298),
 'daxing_aq': Int64Index([    30,     57,     76,    138,    169,    195,    229,    248,
                294,    322,
             ...
             360096, 360127, 360153, 360195, 360235, 360273, 360306, 360332,
             360387, 360421],
            dtype='int64', length=10298),
 'dingling_aq': Int64Index([    29,     51,     84,    133,    155,    206,    222,    279,
                299,    331,
             ...
             360082, 360148, 360152, 360218, 360242, 360282, 360309, 360336,
             360392, 360414],
            dtype='int64', length=10298),
 'donggaocun_aq': Int64Index([     9,     69,    101,    110,    161,    208,    225,    273,
                304,    323,
             ...
             360084, 360143, 360170, 360208, 360236, 360260, 360318, 360341,
             360369, 360423],
            dtype='int64', length=10298),
 'dongsi_aq': Int64Index([    28,     58,     78,    132,    143,    203,    237,    246,
                308,    333,
             ...
             360108, 360140, 360172, 360185, 360241, 360269, 360310, 360351,
             360373, 360401],
            dtype='int64', length=10298),
 'dongsihuan_aq': Int64Index([    19,     44,     83,    109,    168,    187,    224,    268,
                289,    343,
             ...
             360102, 360130, 360175, 360190, 360252, 360261, 360302, 360356,
             360364, 360400],
            dtype='int64', length=10298),
 'fangshan_aq': Int64Index([    32,     36,     71,    131,    162,    201,    220,    254,
                309,    348,
             ...
             360104, 360134, 360156, 360217, 360222, 360286, 360315, 360329,
             360368, 360422],
            dtype='int64', length=10298),
 'fengtaihuayuan_aq': Int64Index([     3,     63,    102,    139,    157,    189,    238,    259,
                290,    344,
             ...
             360098, 360146, 360169, 360206, 360220, 360289, 360294, 360345,
             360385, 360424],
            dtype='int64', length=10298),
 'guanyuan_aq': Int64Index([    22,     67,     82,    134,    173,    197,    234,    245,
                281,    349,
             ...
             360088, 360144, 360155, 360199, 360234, 360288, 360313, 360338,
             360377, 360429],
            dtype='int64', length=10298),
 'gucheng_aq': Int64Index([     4,     46,     86,    111,    154,    188,    242,    267,
                284,    340,
             ...
             360092, 360125, 360181, 360197, 360253, 360268, 360297, 360357,
             360367, 360405],
            dtype='int64', length=10298),
 'huairou_aq': Int64Index([    18,     38,     87,    130,    163,    192,    221,    261,
                302,    332,
             ...
             360113, 360132, 360171, 360194, 360249, 360255, 360292, 360354,
             360372, 360402],
            dtype='int64', length=10298),
 'liulihe_aq': Int64Index([    12,     64,     89,    112,    165,    196,    227,    277,
                285,    316,
             ...
             360112, 360141, 360176, 360205, 360240, 360266, 360308, 360343,
             360371, 360425],
            dtype='int64', length=10298),
 'mentougou_aq': Int64Index([    16,     37,     91,    122,    148,    176,    213,    251,
                298,    330,
             ...
             360105, 360138, 360173, 360204, 360233, 360256, 360305, 360340,
             360365, 360415],
            dtype='int64', length=10298),
 'miyun_aq': Int64Index([     8,     42,     99,    127,    152,    191,    241,    263,
                306,    335,
             ...
             360103, 360122, 360179, 360201, 360247, 360275, 360301, 360335,
             360362, 360413],
            dtype='int64', length=10298),
 'miyunshuiku_aq': Int64Index([    34,     60,     70,    106,    159,    202,    233,    270,
                307,    321,
             ...
             360091, 360120, 360183, 360215, 360232, 360280, 360291, 360337,
             360394, 360416],
            dtype='int64', length=10298),
 'nansanhuan_aq': Int64Index([    27,     49,     85,    135,    156,    181,    217,    274,
                310,    336,
             ...
             360110, 360136, 360157, 360192, 360248, 360278, 360298, 360339,
             360388, 360419],
            dtype='int64', length=10298),
 'nongzhanguan_aq': Int64Index([    33,     47,     75,    113,    170,    182,    223,    269,
                287,    317,
             ...
             360100, 360129, 360154, 360198, 360225, 360277, 360299, 360342,
             360384, 360426],
            dtype='int64', length=10298),
 'pingchang_aq': Int64Index([     2,     40,     92,    108,    150,    177,    232,    275,
                296,    326,
             ...
             360083, 360116, 360177, 360202, 360224, 360258, 360307, 360358,
             360381, 360399],
            dtype='int64', length=10298),
 'pinggu_aq': Int64Index([    31,     65,     72,    105,    140,    194,    230,    265,
                280,    345,
             ...
             360081, 360147, 360164, 360214, 360230, 360264, 360322, 360353,
             360386, 360403],
            dtype='int64', length=10298),
 'qianmen_aq': Int64Index([    26,     41,     79,    137,    145,    209,    235,    266,
                286,    315,
             ...
             360107, 360139, 360163, 360186, 360223, 360263, 360300, 360355,
             360391, 360397],
            dtype='int64', length=10298),
 'shunyi_aq': Int64Index([    17,     66,     74,    119,    146,    180,    228,    249,
                293,    341,
             ...
             360114, 360133, 360162, 360200, 360250, 360276, 360304, 360330,
             360361, 360408],
            dtype='int64', length=10298),
 'tiantan_aq': Int64Index([     5,     50,     95,    107,    151,    183,    239,    278,
                288,    346,
             ...
             360090, 360118, 360178, 360209, 360243, 360274, 360312, 360331,
             360366, 360420],
            dtype='int64', length=10298),
 'tongzhou_aq': Int64Index([     1,     53,     77,    123,    142,    207,    226,    247,
                300,    318,
             ...
             360085, 360149, 360168, 360211, 360237, 360262, 360317, 360328,
             360375, 360428],
            dtype='int64', length=10298),
 'wanliu_aq': Int64Index([    21,     54,     73,    124,    141,    199,    244,    250,
                303,    319,
             ...
             360106, 360135, 360167, 360188, 360254, 360270, 360319, 360325,
             360360, 360404],
            dtype='int64', length=10298),
 'wanshouxigong_aq': Int64Index([    20,     45,    103,    114,    153,    193,    211,    276,
                312,    324,
             ...
             360080, 360124, 360161, 360210, 360227, 360265, 360320, 360327,
             360380, 360395],
            dtype='int64', length=10298),
 'xizhimenbei_aq': Int64Index([    15,     62,     90,    118,    172,    186,    214,    253,
                305,    334,
             ...
             360111, 360115, 360182, 360213, 360228, 360281, 360290, 360347,
             360382, 360409],
            dtype='int64', length=10298),
 'yanqin_aq': Int64Index([     6,     59,    100,    120,    144,    175,    231,    255,
                295,    337,
             ...
             360087, 360117, 360160, 360216, 360239, 360285, 360316, 360349,
             360379, 360412],
            dtype='int64', length=10298),
 'yizhuang_aq': Int64Index([     0,     56,     97,    116,    174,    204,    212,    262,
                313,    327,
             ...
             360086, 360145, 360166, 360212, 360238, 360272, 360324, 360333,
             360374, 360427],
            dtype='int64', length=10298),
 'yongdingmennei_aq': Int64Index([    23,     35,    104,    121,    171,    190,    215,    271,
                297,    328,
             ...
             360094, 360131, 360174, 360187, 360245, 360267, 360311, 360344,
             360370, 360418],
            dtype='int64', length=10298),
 'yongledian_aq': Int64Index([    24,     39,     80,    136,    160,    185,    210,    264,
                282,    339,
             ...
             360095, 360126, 360184, 360189, 360251, 360259, 360303, 360350,
             360363, 360398],
            dtype='int64', length=10298),
 'yufa_aq': Int64Index([    11,     55,     98,    126,    164,    184,    240,    258,
                283,    347,
             ...
             360101, 360119, 360150, 360203, 360231, 360284, 360321, 360346,
             360383, 360411],
            dtype='int64', length=10298),
 'yungang_aq': Int64Index([    13,     52,     88,    115,    147,    178,    218,    256,
                311,    325,
             ...
             360109, 360137, 360165, 360193, 360226, 360271, 360295, 360359,
             360389, 360406],
            dtype='int64', length=10298),
 'zhiwuyuan_aq': Int64Index([    25,     61,     81,    129,    158,    198,    236,    272,
                291,    342,
             ...
             360097, 360123, 360158, 360196, 360244, 360257, 360296, 360352,
             360376, 360407],
            dtype='int64', length=10298)}
In [53]:
import gc
gc.collect()
for key in beji_aqi_summary.groupby(['stationId']).groups:
    plt.clf()
    plt.figure(figsize=(20,10))
    plt.title(key,fontsize=18)
    temp_data=beji_aqi_summary.iloc[beji_aqi_summary.groupby(['stationId']).groups[key]]
    plt.plot(temp_data['PM2.5'])
    #plt.xtickets(temp_data['utc_time'], fontsize=15)
    plt.show()
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
In [148]:
#missing values ratio
beijin_missing_pm25=[]
for key in beji_aqi_summary.groupby(['stationId']).groups:
    temp_data=beji_aqi_summary.iloc[beji_aqi_summary.groupby(['stationId']).groups[key]]
    ratio=float(temp_data['PM2.5'].isnull().sum()/len(temp_data))
    if ratio > 0.15:
         print(key)
         beijin_missing_pm25.append(key)
dongsihuan_aq
zhiwuyuan_aq
liulihe_aq
nansanhuan_aq

Labeling in the map (back to geographic EDA)

as you see the station's missing values' ratio which is above 0.15 are labeling at below:

In [149]:
temp=beijingAqCsv_sta.loc[beijingAqCsv_sta['Pollutant Species']==list(air_station_beij.groups.keys())[0]].iloc[:,1:].values[0]
temp=temp.astype(float).tolist()
temp=temp[::-1]
map_beijin_1 = folium.Map(location=temp, zoom_start=9,tiles='Stamen Terrain')
for key in location_beijin:
    if key in beijin_missing_pm25:
        folium.Marker(location=location_beji[key],
              popup='Missing value ratio > 0.15 Location',
              icon=folium.Icon(color='red',icon='info-sign')).add_to(map_beijin_1)
    else:
        folium.Marker(location=location_beji[key]).add_to(map_beijin_1)
map_beijin_1
Out[149]:

London

In [46]:
pd.read_csv(path+"London_historical_aqi_forecast_stations_20180331.csv").iloc[:,1:]
Out[46]:
MeasurementDateGMT station_id PM2.5 (ug/m3) PM10 (ug/m3) NO2 (ug/m3)
0 2017/1/1 0:00 CD1 40.0 44.4 36.6
1 2017/1/1 1:00 CD1 31.6 34.4 46.2
2 2017/1/1 2:00 CD1 24.7 28.1 38.3
3 2017/1/1 3:00 CD1 21.2 24.5 32.8
4 2017/1/1 4:00 CD1 24.9 23.0 28.1
5 2017/1/1 5:00 CD1 24.6 23.9 29.3
6 2017/1/1 6:00 CD1 23.9 22.0 28.8
7 2017/1/1 7:00 CD1 22.0 22.9 34.6
8 2017/1/1 8:00 CD1 19.0 20.1 44.6
9 2017/1/1 9:00 CD1 19.9 24.4 55.3
10 2017/1/1 10:00 CD1 16.6 17.5 46.4
11 2017/1/1 11:00 CD1 14.5 14.6 42.5
12 2017/1/1 12:00 CD1 11.0 13.9 44.5
13 2017/1/1 13:00 CD1 13.9 13.4 53.6
14 2017/1/1 14:00 CD1 8.3 8.6 61.4
15 2017/1/1 15:00 CD1 7.3 6.1 57.9
16 2017/1/1 16:00 CD1 4.9 6.1 46.0
17 2017/1/1 17:00 CD1 4.6 8.1 96.5
18 2017/1/1 18:00 CD1 8.3 7.3 64.2
19 2017/1/1 19:00 CD1 7.3 8.9 59.7
20 2017/1/1 20:00 CD1 7.8 11.3 61.3
21 2017/1/1 21:00 CD1 11.1 9.8 57.0
22 2017/1/1 22:00 CD1 7.1 13.0 50.9
23 2017/1/1 23:00 CD1 6.8 11.4 40.2
24 2017/1/2 0:00 CD1 6.2 7.3 33.9
25 2017/1/2 1:00 CD1 9.0 9.9 21.7
26 2017/1/2 2:00 CD1 8.2 6.4 19.5
27 2017/1/2 3:00 CD1 7.0 9.1 18.8
28 2017/1/2 4:00 CD1 6.7 8.6 19.7
29 2017/1/2 5:00 CD1 7.4 13.3 27.7
... ... ... ... ... ...
141631 2018/3/29 19:00 TH4 8.5 16.7 85.9
141632 2018/3/29 20:00 TH4 8.8 19.4 89.6
141633 2018/3/29 21:00 TH4 8.8 17.9 82.4
141634 2018/3/29 22:00 TH4 5.0 14.5 61.8
141635 2018/3/29 23:00 TH4 4.6 14.2 67.6
141636 2018/3/30 0:00 TH4 4.8 11.8 55.4
141637 2018/3/30 1:00 TH4 2.0 11.4 47.4
141638 2018/3/30 2:00 TH4 4.2 13.5 51.4
141639 2018/3/30 3:00 TH4 3.1 13.8 45.8
141640 2018/3/30 4:00 TH4 5.0 12.6 45.4
141641 2018/3/30 5:00 TH4 4.4 13.1 45.6
141642 2018/3/30 6:00 TH4 8.2 14.6 47.4
141643 2018/3/30 7:00 TH4 8.2 18.3 36.6
141644 2018/3/30 8:00 TH4 9.0 15.6 35.0
141645 2018/3/30 9:00 TH4 9.1 17.7 38.0
141646 2018/3/30 10:00 TH4 10.3 12.7 35.3
141647 2018/3/30 11:00 TH4 12.0 13.2 29.4
141648 2018/3/30 12:00 TH4 9.1 13.5 26.2
141649 2018/3/30 13:00 TH4 7.7 15.0 35.3
141650 2018/3/30 14:00 TH4 7.0 12.2 36.6
141651 2018/3/30 15:00 TH4 6.4 11.0 30.9
141652 2018/3/30 16:00 TH4 6.4 14.0 39.8
141653 2018/3/30 17:00 TH4 14.4 16.6 55.2
141654 2018/3/30 18:00 TH4 11.2 18.8 63.3
141655 2018/3/30 19:00 TH4 6.3 16.1 67.7
141656 2018/3/30 20:00 TH4 3.5 11.2 44.3
141657 2018/3/30 21:00 TH4 4.7 12.3 52.8
141658 2018/3/30 22:00 TH4 5.4 14.0 54.7
141659 2018/3/30 23:00 TH4 8.9 16.5 47.0
141660 2018/3/31 0:00 TH4 NaN NaN NaN

141661 rows × 5 columns

In [47]:
pd.read_csv(path+"London_historical_aqi_other_stations_20180331.csv").iloc[:,:5]
/home/paslab/.local/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2785: DtypeWarning: Columns (0,1) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
Out[47]:
Station_ID MeasurementDateGMT PM2.5 (ug/m3) PM10 (ug/m3) NO2 (ug/m3)
0 LH0 2017/1/1 0:00 30.2 34.6 15.9
1 LH0 2017/1/1 1:00 25.4 29.2 11.8
2 LH0 2017/1/1 2:00 24.7 28.1 11.6
3 LH0 2017/1/1 3:00 23.6 27.0 13.0
4 LH0 2017/1/1 4:00 24.2 27.4 27.1
5 LH0 2017/1/1 5:00 22.8 26.0 22.9
6 LH0 2017/1/1 6:00 21.6 24.8 26.8
7 LH0 2017/1/1 7:00 19.9 23.1 39.4
8 LH0 2017/1/1 8:00 18.3 21.3 41.6
9 LH0 2017/1/1 9:00 16.3 19.5 44.1
10 LH0 2017/1/1 10:00 13.3 16.2 49.1
11 LH0 2017/1/1 11:00 9.4 11.6 45.2
12 LH0 2017/1/1 12:00 6.1 8.5 41.4
13 LH0 2017/1/1 13:00 6.7 13.4 53.6
14 LH0 2017/1/1 14:00 2.1 4.6 11.7
15 LH0 2017/1/1 15:00 0.9 2.0 12.1
16 LH0 2017/1/1 16:00 1.1 2.1 12.0
17 LH0 2017/1/1 17:00 1.0 2.0 13.5
18 LH0 2017/1/1 18:00 1.5 2.5 14.6
19 LH0 2017/1/1 19:00 2.1 3.6 14.7
20 LH0 2017/1/1 20:00 2.9 4.8 15.5
21 LH0 2017/1/1 21:00 4.5 6.9 14.0
22 LH0 2017/1/1 22:00 4.9 7.8 11.6
23 LH0 2017/1/1 23:00 5.9 9.3 11.8
24 LH0 2017/1/2 0:00 5.1 8.4 9.6
25 LH0 2017/1/2 1:00 4.5 7.7 6.9
26 LH0 2017/1/2 2:00 4.3 7.7 7.3
27 LH0 2017/1/2 3:00 4.9 8.4 6.5
28 LH0 2017/1/2 4:00 6.7 10.8 7.1
29 LH0 2017/1/2 5:00 7.0 13.1 15.3
... ... ... ... ... ...
141603 NaN NaN NaN NaN NaN
141604 NaN NaN NaN NaN NaN
141605 NaN NaN NaN NaN NaN
141606 NaN NaN NaN NaN NaN
141607 NaN NaN NaN NaN NaN
141608 NaN NaN NaN NaN NaN
141609 NaN NaN NaN NaN NaN
141610 NaN NaN NaN NaN NaN
141611 NaN NaN NaN NaN NaN
141612 NaN NaN NaN NaN NaN
141613 NaN NaN NaN NaN NaN
141614 NaN NaN NaN NaN NaN
141615 NaN NaN NaN NaN NaN
141616 NaN NaN NaN NaN NaN
141617 NaN NaN NaN NaN NaN
141618 NaN NaN NaN NaN NaN
141619 NaN NaN NaN NaN NaN
141620 NaN NaN NaN NaN NaN
141621 NaN NaN NaN NaN NaN
141622 NaN NaN NaN NaN NaN
141623 NaN NaN NaN NaN NaN
141624 NaN NaN NaN NaN NaN
141625 NaN NaN NaN NaN NaN
141626 NaN NaN NaN NaN NaN
141627 NaN NaN NaN NaN NaN
141628 NaN NaN NaN NaN NaN
141629 NaN NaN NaN NaN NaN
141630 NaN NaN NaN NaN NaN
141631 NaN NaN NaN NaN NaN
141632 NaN NaN NaN NaN NaN

141633 rows × 5 columns

In [48]:
London_sta=pd.read_csv(path+"London_historical_aqi_forecast_stations_20180331.csv").iloc[:,1:]
London_sta_other=pd.read_csv(path+"London_historical_aqi_other_stations_20180331.csv").iloc[:,:5]
/home/paslab/.local/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2785: DtypeWarning: Columns (0,1) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
In [50]:
London_sta_other.loc[~London_sta_other["Station_ID"].isnull(),:]
Out[50]:
Station_ID MeasurementDateGMT PM2.5 (ug/m3) PM10 (ug/m3) NO2 (ug/m3)
0 LH0 2017/1/1 0:00 30.2 34.6 15.9
1 LH0 2017/1/1 1:00 25.4 29.2 11.8
2 LH0 2017/1/1 2:00 24.7 28.1 11.6
3 LH0 2017/1/1 3:00 23.6 27.0 13.0
4 LH0 2017/1/1 4:00 24.2 27.4 27.1
5 LH0 2017/1/1 5:00 22.8 26.0 22.9
6 LH0 2017/1/1 6:00 21.6 24.8 26.8
7 LH0 2017/1/1 7:00 19.9 23.1 39.4
8 LH0 2017/1/1 8:00 18.3 21.3 41.6
9 LH0 2017/1/1 9:00 16.3 19.5 44.1
10 LH0 2017/1/1 10:00 13.3 16.2 49.1
11 LH0 2017/1/1 11:00 9.4 11.6 45.2
12 LH0 2017/1/1 12:00 6.1 8.5 41.4
13 LH0 2017/1/1 13:00 6.7 13.4 53.6
14 LH0 2017/1/1 14:00 2.1 4.6 11.7
15 LH0 2017/1/1 15:00 0.9 2.0 12.1
16 LH0 2017/1/1 16:00 1.1 2.1 12.0
17 LH0 2017/1/1 17:00 1.0 2.0 13.5
18 LH0 2017/1/1 18:00 1.5 2.5 14.6
19 LH0 2017/1/1 19:00 2.1 3.6 14.7
20 LH0 2017/1/1 20:00 2.9 4.8 15.5
21 LH0 2017/1/1 21:00 4.5 6.9 14.0
22 LH0 2017/1/1 22:00 4.9 7.8 11.6
23 LH0 2017/1/1 23:00 5.9 9.3 11.8
24 LH0 2017/1/2 0:00 5.1 8.4 9.6
25 LH0 2017/1/2 1:00 4.5 7.7 6.9
26 LH0 2017/1/2 2:00 4.3 7.7 7.3
27 LH0 2017/1/2 3:00 4.9 8.4 6.5
28 LH0 2017/1/2 4:00 6.7 10.8 7.1
29 LH0 2017/1/2 5:00 7.0 13.1 15.3
... ... ... ... ... ...
118644 CT2 2018/3/30 18:00 13.0 NaN NaN
118645 CT2 2018/3/30 19:00 8.0 NaN NaN
118646 CT2 2018/3/30 20:00 6.0 NaN NaN
118647 CT2 2018/3/30 21:00 5.0 NaN NaN
118648 CT2 2018/3/30 22:00 7.0 NaN NaN
118649 CT2 2018/3/30 23:00 10.0 NaN NaN
118650 CT2 2018/3/31 0:00 8.0 NaN NaN
118651 CT2 2018/3/31 1:00 6.0 NaN NaN
118652 CT2 2018/3/31 2:00 6.0 NaN NaN
118653 CT2 2018/3/31 3:00 5.0 NaN NaN
118654 CT2 2018/3/31 4:00 4.0 NaN NaN
118655 CT2 2018/3/31 5:00 6.0 NaN NaN
118656 CT2 2018/3/31 6:00 8.0 NaN NaN
118657 CT2 2018/3/31 7:00 7.0 NaN NaN
118658 CT2 2018/3/31 8:00 6.0 NaN NaN
118659 CT2 2018/3/31 9:00 NaN NaN NaN
118660 CT2 2018/3/31 10:00 NaN NaN NaN
118661 CT2 2018/3/31 11:00 NaN NaN NaN
118662 CT2 2018/3/31 12:00 NaN NaN NaN
118663 CT2 2018/3/31 13:00 NaN NaN NaN
118664 CT2 2018/3/31 14:00 NaN NaN NaN
118665 CT2 2018/3/31 15:00 NaN NaN NaN
118666 CT2 2018/3/31 16:00 NaN NaN NaN
118667 CT2 2018/3/31 17:00 NaN NaN NaN
118668 CT2 2018/3/31 18:00 NaN NaN NaN
118669 CT2 2018/3/31 19:00 NaN NaN NaN
118670 CT2 2018/3/31 20:00 NaN NaN NaN
118671 CT2 2018/3/31 21:00 NaN NaN NaN
118672 CT2 2018/3/31 22:00 NaN NaN NaN
118673 CT2 2018/3/31 23:00 NaN NaN NaN

118674 rows × 5 columns

In [51]:
London_sta_other=London_sta_other.loc[~London_sta_other["Station_ID"].isnull(),:]
London_sta_other.columns
#change to lower char
London_sta_other.columns = London_sta_other.columns.str.lower()
London_sta.columns = London_sta.columns.str.lower()
London_sta_other.columns==London_sta.columns
Out[51]:
array([False, False,  True,  True,  True])
In [53]:
London_sta
Out[53]:
measurementdategmt station_id pm2.5 (ug/m3) pm10 (ug/m3) no2 (ug/m3)
0 2017/1/1 0:00 CD1 40.0 44.4 36.6
1 2017/1/1 1:00 CD1 31.6 34.4 46.2
2 2017/1/1 2:00 CD1 24.7 28.1 38.3
3 2017/1/1 3:00 CD1 21.2 24.5 32.8
4 2017/1/1 4:00 CD1 24.9 23.0 28.1
5 2017/1/1 5:00 CD1 24.6 23.9 29.3
6 2017/1/1 6:00 CD1 23.9 22.0 28.8
7 2017/1/1 7:00 CD1 22.0 22.9 34.6
8 2017/1/1 8:00 CD1 19.0 20.1 44.6
9 2017/1/1 9:00 CD1 19.9 24.4 55.3
10 2017/1/1 10:00 CD1 16.6 17.5 46.4
11 2017/1/1 11:00 CD1 14.5 14.6 42.5
12 2017/1/1 12:00 CD1 11.0 13.9 44.5
13 2017/1/1 13:00 CD1 13.9 13.4 53.6
14 2017/1/1 14:00 CD1 8.3 8.6 61.4
15 2017/1/1 15:00 CD1 7.3 6.1 57.9
16 2017/1/1 16:00 CD1 4.9 6.1 46.0
17 2017/1/1 17:00 CD1 4.6 8.1 96.5
18 2017/1/1 18:00 CD1 8.3 7.3 64.2
19 2017/1/1 19:00 CD1 7.3 8.9 59.7
20 2017/1/1 20:00 CD1 7.8 11.3 61.3
21 2017/1/1 21:00 CD1 11.1 9.8 57.0
22 2017/1/1 22:00 CD1 7.1 13.0 50.9
23 2017/1/1 23:00 CD1 6.8 11.4 40.2
24 2017/1/2 0:00 CD1 6.2 7.3 33.9
25 2017/1/2 1:00 CD1 9.0 9.9 21.7
26 2017/1/2 2:00 CD1 8.2 6.4 19.5
27 2017/1/2 3:00 CD1 7.0 9.1 18.8
28 2017/1/2 4:00 CD1 6.7 8.6 19.7
29 2017/1/2 5:00 CD1 7.4 13.3 27.7
... ... ... ... ... ...
141631 2018/3/29 19:00 TH4 8.5 16.7 85.9
141632 2018/3/29 20:00 TH4 8.8 19.4 89.6
141633 2018/3/29 21:00 TH4 8.8 17.9 82.4
141634 2018/3/29 22:00 TH4 5.0 14.5 61.8
141635 2018/3/29 23:00 TH4 4.6 14.2 67.6
141636 2018/3/30 0:00 TH4 4.8 11.8 55.4
141637 2018/3/30 1:00 TH4 2.0 11.4 47.4
141638 2018/3/30 2:00 TH4 4.2 13.5 51.4
141639 2018/3/30 3:00 TH4 3.1 13.8 45.8
141640 2018/3/30 4:00 TH4 5.0 12.6 45.4
141641 2018/3/30 5:00 TH4 4.4 13.1 45.6
141642 2018/3/30 6:00 TH4 8.2 14.6 47.4
141643 2018/3/30 7:00 TH4 8.2 18.3 36.6
141644 2018/3/30 8:00 TH4 9.0 15.6 35.0
141645 2018/3/30 9:00 TH4 9.1 17.7 38.0
141646 2018/3/30 10:00 TH4 10.3 12.7 35.3
141647 2018/3/30 11:00 TH4 12.0 13.2 29.4
141648 2018/3/30 12:00 TH4 9.1 13.5 26.2
141649 2018/3/30 13:00 TH4 7.7 15.0 35.3
141650 2018/3/30 14:00 TH4 7.0 12.2 36.6
141651 2018/3/30 15:00 TH4 6.4 11.0 30.9
141652 2018/3/30 16:00 TH4 6.4 14.0 39.8
141653 2018/3/30 17:00 TH4 14.4 16.6 55.2
141654 2018/3/30 18:00 TH4 11.2 18.8 63.3
141655 2018/3/30 19:00 TH4 6.3 16.1 67.7
141656 2018/3/30 20:00 TH4 3.5 11.2 44.3
141657 2018/3/30 21:00 TH4 4.7 12.3 52.8
141658 2018/3/30 22:00 TH4 5.4 14.0 54.7
141659 2018/3/30 23:00 TH4 8.9 16.5 47.0
141660 2018/3/31 0:00 TH4 NaN NaN NaN

141661 rows × 5 columns

In [52]:
London_sta_other
Out[52]:
station_id measurementdategmt pm2.5 (ug/m3) pm10 (ug/m3) no2 (ug/m3)
0 LH0 2017/1/1 0:00 30.2 34.6 15.9
1 LH0 2017/1/1 1:00 25.4 29.2 11.8
2 LH0 2017/1/1 2:00 24.7 28.1 11.6
3 LH0 2017/1/1 3:00 23.6 27.0 13.0
4 LH0 2017/1/1 4:00 24.2 27.4 27.1
5 LH0 2017/1/1 5:00 22.8 26.0 22.9
6 LH0 2017/1/1 6:00 21.6 24.8 26.8
7 LH0 2017/1/1 7:00 19.9 23.1 39.4
8 LH0 2017/1/1 8:00 18.3 21.3 41.6
9 LH0 2017/1/1 9:00 16.3 19.5 44.1
10 LH0 2017/1/1 10:00 13.3 16.2 49.1
11 LH0 2017/1/1 11:00 9.4 11.6 45.2
12 LH0 2017/1/1 12:00 6.1 8.5 41.4
13 LH0 2017/1/1 13:00 6.7 13.4 53.6
14 LH0 2017/1/1 14:00 2.1 4.6 11.7
15 LH0 2017/1/1 15:00 0.9 2.0 12.1
16 LH0 2017/1/1 16:00 1.1 2.1 12.0
17 LH0 2017/1/1 17:00 1.0 2.0 13.5
18 LH0 2017/1/1 18:00 1.5 2.5 14.6
19 LH0 2017/1/1 19:00 2.1 3.6 14.7
20 LH0 2017/1/1 20:00 2.9 4.8 15.5
21 LH0 2017/1/1 21:00 4.5 6.9 14.0
22 LH0 2017/1/1 22:00 4.9 7.8 11.6
23 LH0 2017/1/1 23:00 5.9 9.3 11.8
24 LH0 2017/1/2 0:00 5.1 8.4 9.6
25 LH0 2017/1/2 1:00 4.5 7.7 6.9
26 LH0 2017/1/2 2:00 4.3 7.7 7.3
27 LH0 2017/1/2 3:00 4.9 8.4 6.5
28 LH0 2017/1/2 4:00 6.7 10.8 7.1
29 LH0 2017/1/2 5:00 7.0 13.1 15.3
... ... ... ... ... ...
118644 CT2 2018/3/30 18:00 13.0 NaN NaN
118645 CT2 2018/3/30 19:00 8.0 NaN NaN
118646 CT2 2018/3/30 20:00 6.0 NaN NaN
118647 CT2 2018/3/30 21:00 5.0 NaN NaN
118648 CT2 2018/3/30 22:00 7.0 NaN NaN
118649 CT2 2018/3/30 23:00 10.0 NaN NaN
118650 CT2 2018/3/31 0:00 8.0 NaN NaN
118651 CT2 2018/3/31 1:00 6.0 NaN NaN
118652 CT2 2018/3/31 2:00 6.0 NaN NaN
118653 CT2 2018/3/31 3:00 5.0 NaN NaN
118654 CT2 2018/3/31 4:00 4.0 NaN NaN
118655 CT2 2018/3/31 5:00 6.0 NaN NaN
118656 CT2 2018/3/31 6:00 8.0 NaN NaN
118657 CT2 2018/3/31 7:00 7.0 NaN NaN
118658 CT2 2018/3/31 8:00 6.0 NaN NaN
118659 CT2 2018/3/31 9:00 NaN NaN NaN
118660 CT2 2018/3/31 10:00 NaN NaN NaN
118661 CT2 2018/3/31 11:00 NaN NaN NaN
118662 CT2 2018/3/31 12:00 NaN NaN NaN
118663 CT2 2018/3/31 13:00 NaN NaN NaN
118664 CT2 2018/3/31 14:00 NaN NaN NaN
118665 CT2 2018/3/31 15:00 NaN NaN NaN
118666 CT2 2018/3/31 16:00 NaN NaN NaN
118667 CT2 2018/3/31 17:00 NaN NaN NaN
118668 CT2 2018/3/31 18:00 NaN NaN NaN
118669 CT2 2018/3/31 19:00 NaN NaN NaN
118670 CT2 2018/3/31 20:00 NaN NaN NaN
118671 CT2 2018/3/31 21:00 NaN NaN NaN
118672 CT2 2018/3/31 22:00 NaN NaN NaN
118673 CT2 2018/3/31 23:00 NaN NaN NaN

118674 rows × 5 columns

In [58]:
#columnsTitles=["measurementdategmt","station_id"]
London_sta_other.reindex(columns=list(London_sta.columns))
Out[58]:
measurementdategmt station_id pm2.5 (ug/m3) pm10 (ug/m3) no2 (ug/m3)
0 2017/1/1 0:00 LH0 30.2 34.6 15.9
1 2017/1/1 1:00 LH0 25.4 29.2 11.8
2 2017/1/1 2:00 LH0 24.7 28.1 11.6
3 2017/1/1 3:00 LH0 23.6 27.0 13.0
4 2017/1/1 4:00 LH0 24.2 27.4 27.1
5 2017/1/1 5:00 LH0 22.8 26.0 22.9
6 2017/1/1 6:00 LH0 21.6 24.8 26.8
7 2017/1/1 7:00 LH0 19.9 23.1 39.4
8 2017/1/1 8:00 LH0 18.3 21.3 41.6
9 2017/1/1 9:00 LH0 16.3 19.5 44.1
10 2017/1/1 10:00 LH0 13.3 16.2 49.1
11 2017/1/1 11:00 LH0 9.4 11.6 45.2
12 2017/1/1 12:00 LH0 6.1 8.5 41.4
13 2017/1/1 13:00 LH0 6.7 13.4 53.6
14 2017/1/1 14:00 LH0 2.1 4.6 11.7
15 2017/1/1 15:00 LH0 0.9 2.0 12.1
16 2017/1/1 16:00 LH0 1.1 2.1 12.0
17 2017/1/1 17:00 LH0 1.0 2.0 13.5
18 2017/1/1 18:00 LH0 1.5 2.5 14.6
19 2017/1/1 19:00 LH0 2.1 3.6 14.7
20 2017/1/1 20:00 LH0 2.9 4.8 15.5
21 2017/1/1 21:00 LH0 4.5 6.9 14.0
22 2017/1/1 22:00 LH0 4.9 7.8 11.6
23 2017/1/1 23:00 LH0 5.9 9.3 11.8
24 2017/1/2 0:00 LH0 5.1 8.4 9.6
25 2017/1/2 1:00 LH0 4.5 7.7 6.9
26 2017/1/2 2:00 LH0 4.3 7.7 7.3
27 2017/1/2 3:00 LH0 4.9 8.4 6.5
28 2017/1/2 4:00 LH0 6.7 10.8 7.1
29 2017/1/2 5:00 LH0 7.0 13.1 15.3
... ... ... ... ... ...
118644 2018/3/30 18:00 CT2 13.0 NaN NaN
118645 2018/3/30 19:00 CT2 8.0 NaN NaN
118646 2018/3/30 20:00 CT2 6.0 NaN NaN
118647 2018/3/30 21:00 CT2 5.0 NaN NaN
118648 2018/3/30 22:00 CT2 7.0 NaN NaN
118649 2018/3/30 23:00 CT2 10.0 NaN NaN
118650 2018/3/31 0:00 CT2 8.0 NaN NaN
118651 2018/3/31 1:00 CT2 6.0 NaN NaN
118652 2018/3/31 2:00 CT2 6.0 NaN NaN
118653 2018/3/31 3:00 CT2 5.0 NaN NaN
118654 2018/3/31 4:00 CT2 4.0 NaN NaN
118655 2018/3/31 5:00 CT2 6.0 NaN NaN
118656 2018/3/31 6:00 CT2 8.0 NaN NaN
118657 2018/3/31 7:00 CT2 7.0 NaN NaN
118658 2018/3/31 8:00 CT2 6.0 NaN NaN
118659 2018/3/31 9:00 CT2 NaN NaN NaN
118660 2018/3/31 10:00 CT2 NaN NaN NaN
118661 2018/3/31 11:00 CT2 NaN NaN NaN
118662 2018/3/31 12:00 CT2 NaN NaN NaN
118663 2018/3/31 13:00 CT2 NaN NaN NaN
118664 2018/3/31 14:00 CT2 NaN NaN NaN
118665 2018/3/31 15:00 CT2 NaN NaN NaN
118666 2018/3/31 16:00 CT2 NaN NaN NaN
118667 2018/3/31 17:00 CT2 NaN NaN NaN
118668 2018/3/31 18:00 CT2 NaN NaN NaN
118669 2018/3/31 19:00 CT2 NaN NaN NaN
118670 2018/3/31 20:00 CT2 NaN NaN NaN
118671 2018/3/31 21:00 CT2 NaN NaN NaN
118672 2018/3/31 22:00 CT2 NaN NaN NaN
118673 2018/3/31 23:00 CT2 NaN NaN NaN

118674 rows × 5 columns

In [60]:
London_sta_other=London_sta_other.reindex(columns=list(London_sta.columns))
pd.concat([London_sta_other,London_sta],axis=0)
Out[60]:
measurementdategmt station_id pm2.5 (ug/m3) pm10 (ug/m3) no2 (ug/m3)
0 2017/1/1 0:00 LH0 30.2 34.6 15.9
1 2017/1/1 1:00 LH0 25.4 29.2 11.8
2 2017/1/1 2:00 LH0 24.7 28.1 11.6
3 2017/1/1 3:00 LH0 23.6 27.0 13.0
4 2017/1/1 4:00 LH0 24.2 27.4 27.1
5 2017/1/1 5:00 LH0 22.8 26.0 22.9
6 2017/1/1 6:00 LH0 21.6 24.8 26.8
7 2017/1/1 7:00 LH0 19.9 23.1 39.4
8 2017/1/1 8:00 LH0 18.3 21.3 41.6
9 2017/1/1 9:00 LH0 16.3 19.5 44.1
10 2017/1/1 10:00 LH0 13.3 16.2 49.1
11 2017/1/1 11:00 LH0 9.4 11.6 45.2
12 2017/1/1 12:00 LH0 6.1 8.5 41.4
13 2017/1/1 13:00 LH0 6.7 13.4 53.6
14 2017/1/1 14:00 LH0 2.1 4.6 11.7
15 2017/1/1 15:00 LH0 0.9 2.0 12.1
16 2017/1/1 16:00 LH0 1.1 2.1 12.0
17 2017/1/1 17:00 LH0 1.0 2.0 13.5
18 2017/1/1 18:00 LH0 1.5 2.5 14.6
19 2017/1/1 19:00 LH0 2.1 3.6 14.7
20 2017/1/1 20:00 LH0 2.9 4.8 15.5
21 2017/1/1 21:00 LH0 4.5 6.9 14.0
22 2017/1/1 22:00 LH0 4.9 7.8 11.6
23 2017/1/1 23:00 LH0 5.9 9.3 11.8
24 2017/1/2 0:00 LH0 5.1 8.4 9.6
25 2017/1/2 1:00 LH0 4.5 7.7 6.9
26 2017/1/2 2:00 LH0 4.3 7.7 7.3
27 2017/1/2 3:00 LH0 4.9 8.4 6.5
28 2017/1/2 4:00 LH0 6.7 10.8 7.1
29 2017/1/2 5:00 LH0 7.0 13.1 15.3
... ... ... ... ... ...
141631 2018/3/29 19:00 TH4 8.5 16.7 85.9
141632 2018/3/29 20:00 TH4 8.8 19.4 89.6
141633 2018/3/29 21:00 TH4 8.8 17.9 82.4
141634 2018/3/29 22:00 TH4 5.0 14.5 61.8
141635 2018/3/29 23:00 TH4 4.6 14.2 67.6
141636 2018/3/30 0:00 TH4 4.8 11.8 55.4
141637 2018/3/30 1:00 TH4 2.0 11.4 47.4
141638 2018/3/30 2:00 TH4 4.2 13.5 51.4
141639 2018/3/30 3:00 TH4 3.1 13.8 45.8
141640 2018/3/30 4:00 TH4 5.0 12.6 45.4
141641 2018/3/30 5:00 TH4 4.4 13.1 45.6
141642 2018/3/30 6:00 TH4 8.2 14.6 47.4
141643 2018/3/30 7:00 TH4 8.2 18.3 36.6
141644 2018/3/30 8:00 TH4 9.0 15.6 35.0
141645 2018/3/30 9:00 TH4 9.1 17.7 38.0
141646 2018/3/30 10:00 TH4 10.3 12.7 35.3
141647 2018/3/30 11:00 TH4 12.0 13.2 29.4
141648 2018/3/30 12:00 TH4 9.1 13.5 26.2
141649 2018/3/30 13:00 TH4 7.7 15.0 35.3
141650 2018/3/30 14:00 TH4 7.0 12.2 36.6
141651 2018/3/30 15:00 TH4 6.4 11.0 30.9
141652 2018/3/30 16:00 TH4 6.4 14.0 39.8
141653 2018/3/30 17:00 TH4 14.4 16.6 55.2
141654 2018/3/30 18:00 TH4 11.2 18.8 63.3
141655 2018/3/30 19:00 TH4 6.3 16.1 67.7
141656 2018/3/30 20:00 TH4 3.5 11.2 44.3
141657 2018/3/30 21:00 TH4 4.7 12.3 52.8
141658 2018/3/30 22:00 TH4 5.4 14.0 54.7
141659 2018/3/30 23:00 TH4 8.9 16.5 47.0
141660 2018/3/31 0:00 TH4 NaN NaN NaN

260335 rows × 5 columns

In [76]:
London_comb=pd.concat([London_sta_other,London_sta],axis=0).sort_values(by=['measurementdategmt'])
London_comb=London_comb.set_index([list(range(0,len(London_comb)))])
In [77]:
London_comb
Out[77]:
measurementdategmt station_id pm2.5 (ug/m3) pm10 (ug/m3) no2 (ug/m3)
0 2017/1/1 0:00 LH0 30.2 34.6 15.9
1 2017/1/1 0:00 CD9 28.7 32.3 90.6
2 2017/1/1 0:00 KF1 NaN NaN NaN
3 2017/1/1 0:00 GN0 50.7 63.3 24.7
4 2017/1/1 0:00 LW2 35.3 42.3 25.2
5 2017/1/1 0:00 GR9 31.7 38.4 26.6
6 2017/1/1 0:00 GN3 59.6 47.5 18.4
7 2017/1/1 0:00 HV1 37.8 40.2 26.0
8 2017/1/1 0:00 MY7 31.5 41.7 NaN
9 2017/1/1 0:00 GR4 31.2 30.5 8.5
10 2017/1/1 0:00 BL0 30.8 31.6 10.8
11 2017/1/1 0:00 CD1 40.0 44.4 36.6
12 2017/1/1 0:00 KC1 31.7 23.2 21.3
13 2017/1/1 0:00 BX9 23.3 NaN NaN
14 2017/1/1 0:00 BX1 23.3 16.1 7.0
15 2017/1/1 0:00 RB7 NaN 55.0 22.0
16 2017/1/1 0:00 CR8 30.0 NaN NaN
17 2017/1/1 0:00 GB0 NaN NaN NaN
18 2017/1/1 0:00 CT3 NaN 35.8 30.8
19 2017/1/1 0:00 TD5 NaN NaN NaN
20 2017/1/1 0:00 ST5 25.0 29.0 14.5
21 2017/1/1 0:00 TH4 28.4 33.3 38.9
22 2017/1/1 0:00 HR1 NaN 70.8 22.2
23 2017/1/1 10:00 LW2 14.3 16.2 27.2
24 2017/1/1 10:00 HV1 17.9 21.7 37.0
25 2017/1/1 10:00 KF1 NaN NaN NaN
26 2017/1/1 10:00 BX9 12.1 NaN NaN
27 2017/1/1 10:00 GN3 12.8 21.6 18.6
28 2017/1/1 10:00 CR8 13.0 NaN NaN
29 2017/1/1 10:00 CD9 12.0 16.9 77.1
... ... ... ... ... ...
260305 2018/3/9 8:00 GN0 14.0 28.5 84.2
260306 2018/3/9 8:00 KF1 9.4 20.1 NaN
260307 2018/3/9 8:00 RB7 19.0 35.0 86.7
260308 2018/3/9 8:00 CT2 18.0 NaN NaN
260309 2018/3/9 9:00 MY7 NaN 37.8 NaN
260310 2018/3/9 9:00 BX9 9.3 NaN NaN
260311 2018/3/9 9:00 CR8 8.0 NaN NaN
260312 2018/3/9 9:00 KF1 9.2 22.4 NaN
260313 2018/3/9 9:00 BX1 NaN 26.3 47.2
260314 2018/3/9 9:00 KC1 NaN NaN 69.2
260315 2018/3/9 9:00 GR4 13.6 24.8 43.4
260316 2018/3/9 9:00 HR1 NaN NaN NaN
260317 2018/3/9 9:00 GN0 11.7 26.1 64.8
260318 2018/3/9 9:00 GR9 9.9 34.5 79.5
260319 2018/3/9 9:00 ST5 7.0 34.0 56.0
260320 2018/3/9 9:00 TD5 11.6 NaN NaN
260321 2018/3/9 9:00 BL0 7.9 28.4 70.4
260322 2018/3/9 9:00 LW2 14.3 39.6 118.8
260323 2018/3/9 9:00 TH4 16.5 58.9 88.2
260324 2018/3/9 9:00 GN3 15.5 32.9 78.2
260325 2018/3/9 9:00 CD1 22.8 65.9 110.4
260326 2018/3/9 9:00 CT3 14.0 31.0 68.9
260327 2018/3/9 9:00 LH0 NaN NaN NaN
260328 2018/3/9 9:00 CT2 18.0 NaN NaN
260329 2018/3/9 9:00 CD9 18.0 32.7 154.4
260330 2018/3/9 9:00 RB7 17.0 28.2 79.8
260331 2018/3/9 9:00 GB0 12.7 NaN NaN
260332 2018/3/9 9:00 HV1 13.0 27.2 66.8
260333 2018/4/1 0:00 HR1 NaN NaN NaN
260334 2018/4/1 0:00 LH0 NaN NaN NaN

260335 rows × 5 columns

In [80]:
London_comb.isnull().sum()/len(London_comb)
Out[80]:
measurementdategmt    0.000000
station_id            0.000000
pm2.5 (ug/m3)         0.189517
pm10 (ug/m3)          0.361196
no2 (ug/m3)           0.361799
dtype: float64
In [78]:
London_comb.groupby(['station_id']).groups
Out[78]:
{'BL0': Int64Index([    10,     39,     47,     83,    110,    117,    159,    166,
                203,    210,
             ...
             260113, 260120, 260158, 260167, 260191, 260227, 260260, 260272,
             260287, 260321],
            dtype='int64', length=10897),
 'BX1': Int64Index([    14,     40,     67,     69,     95,    127,    146,    164,
                188,    216,
             ...
             260094, 260124, 260145, 260185, 260194, 260229, 260257, 260284,
             260297, 260313],
            dtype='int64', length=10920),
 'BX9': Int64Index([    13,     26,     48,     76,    113,    134,    141,    182,
                205,    223,
             ...
             260103, 260126, 260159, 260181, 260199, 260221, 260242, 260282,
             260304, 260310],
            dtype='int64', length=10920),
 'CD1': Int64Index([    11,     45,     46,     91,    102,    124,    147,    167,
                196,    221,
             ...
             260116, 260130, 260144, 260172, 260212, 260222, 260238, 260267,
             260289, 260325],
            dtype='int64', length=10897),
 'CD9': Int64Index([     1,     29,     68,     75,    108,    133,    145,    169,
                193,    213,
             ...
             260114, 260133, 260162, 260168, 260198, 260231, 260252, 260261,
             260294, 260329],
            dtype='int64', length=10897),
 'CR8': Int64Index([    16,     28,     50,     86,    112,    120,    154,    177,
                201,    212,
             ...
             260096, 260138, 260141, 260179, 260211, 260215, 260251, 260280,
             260285, 260311],
            dtype='int64', length=10920),
 'CT2': Int64Index([ 17128,  17158,  17167,  17201,  17216,  17241,  17268,  17299,
              17305,  17351,
             ...
             260109, 260135, 260152, 260188, 260205, 260214, 260246, 260266,
             260308, 260328],
            dtype='int64', length=9504),
 'CT3': Int64Index([    18,     32,     55,     80,    101,    115,    155,    178,
                200,    218,
             ...
             260093, 260118, 260148, 260174, 260195, 260232, 260255, 260268,
             260298, 260326],
            dtype='int64', length=10904),
 'GB0': Int64Index([    17,     33,     52,     81,    100,    116,    160,    161,
                202,    208,
             ...
             260115, 260131, 260161, 260178, 260201, 260218, 260239, 260273,
             260295, 260331],
            dtype='int64', length=10920),
 'GN0': Int64Index([     3,     35,     60,     72,    105,    121,    153,    173,
                191,    207,
             ...
             260106, 260139, 260151, 260182, 260206, 260219, 260244, 260275,
             260305, 260317],
            dtype='int64', length=10897),
 'GN3': Int64Index([     6,     27,     49,     84,     93,    131,    156,    180,
                194,    228,
             ...
             260100, 260121, 260153, 260180, 260207, 260235, 260243, 260270,
             260286, 260324],
            dtype='int64', length=10897),
 'GR4': Int64Index([     9,     36,     62,     70,    111,    122,    152,    174,
                198,    214,
             ...
             260095, 260136, 260156, 260177, 260203, 260220, 260247, 260262,
             260296, 260315],
            dtype='int64', length=10897),
 'GR9': Int64Index([     5,     41,     56,     85,     97,    126,    139,    181,
                195,    209,
             ...
             260107, 260127, 260164, 260165, 260193, 260225, 260259, 260263,
             260299, 260318],
            dtype='int64', length=10897),
 'HR1': Int64Index([    22,     43,     66,     87,    107,    123,    149,    171,
                184,    215,
             ...
             260129, 260149, 260183, 260200, 260217, 260258, 260283, 260290,
             260316, 260333],
            dtype='int64', length=10921),
 'HV1': Int64Index([     7,     24,     58,     71,    114,    136,    151,    172,
                186,    217,
             ...
             260099, 260137, 260147, 260184, 260192, 260224, 260256, 260277,
             260288, 260332],
            dtype='int64', length=10897),
 'KC1': Int64Index([    12,     30,     63,     78,     96,    118,    142,    162,
                192,    227,
             ...
             260097, 260125, 260157, 260186, 260208, 260213, 260241, 260281,
             260302, 260314],
            dtype='int64', length=10920),
 'KF1': Int64Index([     2,     25,     64,     90,    109,    135,    143,    179,
                199,    219,
             ...
             260110, 260117, 260160, 260187, 260202, 260216, 260245, 260279,
             260306, 260312],
            dtype='int64', length=10897),
 'LH0': Int64Index([     0,     38,     61,     77,     98,    137,    148,    168,
                197,    224,
             ...
             260134, 260150, 260176, 260209, 260234, 260248, 260265, 260292,
             260327, 260334],
            dtype='int64', length=10921),
 'LW2': Int64Index([     4,     23,     65,     89,    104,    132,    138,    176,
                189,    229,
             ...
             260108, 260128, 260154, 260169, 260189, 260233, 260253, 260271,
             260300, 260322],
            dtype='int64', length=10897),
 'MY7': Int64Index([     8,     31,     53,     79,     94,    128,    140,    183,
                204,    225,
             ...
             260098, 260123, 260163, 260175, 260196, 260236, 260249, 260278,
             260291, 260309],
            dtype='int64', length=10897),
 'RB7': Int64Index([    15,     34,     57,     74,     99,    119,    144,    165,
                206,    222,
             ...
             260102, 260119, 260155, 260171, 260204, 260226, 260237, 260264,
             260307, 260330],
            dtype='int64', length=10920),
 'ST5': Int64Index([    20,     44,     54,     88,     92,    130,    157,    175,
                190,    220,
             ...
             260111, 260122, 260142, 260166, 260190, 260228, 260250, 260274,
             260293, 260319],
            dtype='int64', length=10897),
 'TD5': Int64Index([    19,     37,     59,     73,    103,    129,    150,    163,
                185,    211,
             ...
             260101, 260140, 260146, 260173, 260197, 260223, 260240, 260276,
             260303, 260320],
            dtype='int64', length=10904),
 'TH4': Int64Index([    21,     42,     51,     82,    106,    125,    158,    170,
                187,    226,
             ...
             260112, 260132, 260143, 260170, 260210, 260230, 260254, 260269,
             260301, 260323],
            dtype='int64', length=10897)}
In [79]:
import gc
gc.collect()
for key in London_comb.groupby(['station_id']).groups:
    plt.clf()
    plt.figure(figsize=(20,10))
    plt.title(key,fontsize=18)
    temp_data=London_comb.iloc[London_comb.groupby(['station_id']).groups[key]]
    plt.plot(temp_data['pm2.5 (ug/m3)'])
    #plt.xtickets(temp_data['utc_time'], fontsize=15)
    plt.show()
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
In [143]:
#missing values ratio
London_missing_pm25=[]
for key in London_comb.groupby(['station_id']).groups:
    temp_data=London_comb.iloc[London_comb.groupby(['station_id']).groups[key]]
    ratio=float(temp_data['pm2.5 (ug/m3)'].isnull().sum()/len(temp_data))
    if ratio > 0.15:
         print(key)
         London_missing_pm25.append(key)
ST5
LW2
KF1
GR4
HR1
KC1
CT3
BX1
LH0

Labeling in the map (back to geographic EDA)

as you see the station's missing values' ratio which is above 0.15 are labeling at below:

In [146]:
map_london_1= folium.Map(location=temp_pd.iloc[0].values.tolist(), zoom_start=9,tiles='Stamen Terrain')
for key in location_Lodon:
    if key in London_missing_pm25:
        folium.Marker(location=location_Lodon[key],
        popup='Missing value ratio > 0.15 Location',
        icon=folium.Icon(color='red',icon='info-sign')).add_to(map_london_1)
    else:
        folium.Marker(location=location_Lodon[key]).add_to(map_london_1)
map_london_1
Out[146]:

VIF